Introduction To The Analysis of Spatial Data Using R
Introduction To The Analysis of Spatial Data Using R
Ana I. Moreno-Monroy
13-16 June 2017
Chapter 1: Introduction
• The Revolution
• R and R Studio
• Getting familiar with R studio
• Basic inspection and statistics
• Subsetting, selecting and creating variables
The Revolution
The R project was conceived in 1992 by Ross Ihaka and Robert Gentleman at the University of Auckland,
New Zealand. Since the realese of the first beta version in 2000, it has grown exponentially
The Revolution
1
Figure 2: Google trends comparison of different statistical software
2
The Revolution: Some possibilities
Software
• R is a “free software environment for statistical analysis”. You can get the latest version of R (3.4.0 -
You Stupid Darkness) here
• As R works through the command line, it can be a bit intimidating
• R Studio is an Integrated Development Environment that facilitates the use of R by providing extra
functionality
• Click here to download the latest version (RStudio Desktop 1.0.143). Note: Check R is installed (and
running properly) before installing R Studio
• R is flexible for writing new functions based on existing ones
• Packages: An R package contains functions, compiled code, data and documentation. R incorporates
base packages, which can be complemented with contributed packages. All available packages are listed
in the Comprehensive R Archive Network (CRAN)
R Studio
• 4 panels: source editor and data viewer; command history and workspace browser; R Console; and
file/help/package/plots
• Types of files: .R (similar to .do file); .RData (default when you press Save, will save all objects in the
Global Environment); .Rmd (R Markdown – this presentation)
• Load projects, data or code using File options
• Some window options (e.g. Tools/install packages, Debug)
Getting help
• In the comnmand line in the console, type ?+your query (similar to help in Stata)
• Googling question usually works
• Useful net resources: (GIS) StackExchange, StackOverFlow and R-Bloggers
• Check the R Journal for new developments
3
pkgs<-c("ggmap")
install.packages(pkgs)
x <- 1:400
y <- sin(x/10)
plot (x,y)
1.0
0.5
0.0
y
−0.5
−1.0
• Create a new R Script (File/New File/R Script). Write a short description after #
• Write a line of code to assign an object called “data” the value of the built-in data frame USAarrests
and verify its class
data = USArrests
class(data)
## [1] "data.frame"
• From the .R document, press CTRL + Enter at the end of each line to execute it. Select all and press
the “Run” button to execute all lines
• Type “USArrest” in the command line and press Enter. What happens?
• Try using <- instead of =
• Type “data = US” and immediately press the tab key after the “S”. What happens?
4
• Get help: Type “?USArrests” in the command line
Structure of object
Summary statistics
5
mean(data$UrbanPop)
## [1] 65.54
• Type ?sd in the command line. Calculate the standard deviation of the first variable and compare with
corresponding value in the summary statistics table above
Simple plots
plot(data$UrbanPop, data$Murder)
15
data$Murder
10
5
30 40 50 60 70 80 90
data$UrbanPop
Subset data
• By rows
data1<-data[1:20,]
• By columns
data2<-data[,3:4]
• By observation
data2<-data[1,1]
6
data2<-data1
rm(data1, data2)
• In console start with database name, add $ and press Tab key
var<-data$Murder
• To create a character vector, name the variable and write words in parenthesis inside c()
answer<-c("FALSE", "FALSE", "TRUE")
names<-c("UrbanPop", "Murder")
## [1] 2
• Above/below value of variable: first create TRUE/FALSE variable using the condition, then grab only
observations satisfying the condition (TRUE) using brackets[]
sel<-data$UrbanPop>mean(data$UrbanPop)
table(sel)
## sel
## FALSE TRUE
## 22 28
data1<-data[sel,]
• Another way: using the subset base function (all in one line)
data2<-subset(data, UrbanPop>mean(UrbanPop))
dim(data2)
## [1] 28 5
7
drop<-c("Rape", "Assault")
data4<-data[,!colnames(data) %in% drop]
dplyr is a good package for handling tabulated data. We can do the same operations as above using the
mutate, filter and select functions
library (dplyr)
data<-USArrests
data<-mutate(data, all_crimes=(Murder + Assault + Rape))
data_2 <- filter(data, UrbanPop>mean(UrbanPop))
dim(data_2)
## [1] 28 5
data_3 <- select(data, UrbanPop, Murder)
dim(data_3)
## [1] 50 2
data_4 <- select(data, -Rape, -Assault)
dim(data_4)
## [1] 50 3
Files can be downloaded from the internet into a local folder using the download.file and unzip functions
of the utils package.
The first step is to verify the current working directory, set a new working directory where the files are going
to be saved (if necessary) and creating a new folder where the files will be saved:
getwd()
setwd("C:/Your_local_path")
if(!dir.exists("data")){dir.create("data")}
After identifying the website containing the path to the “.zip” file, we can access it, download it, unzip it and
save it in our local directory using the following lines:
library(utils)
download.file(url = "http://www.rigis.org/geodata/bnd/muni97d.zip",
destfile = "data/muni97d.zip")
unzip(zipfile = "data/muni97d.zip", exdir = "data")