Uploaded by Jasmine Liu

R dataStructures

advertisement
R data structures
Bonie Thiel, Ph.D. and Gürkan Bebek, Ph.D., M.S.
Systems Biology and Bioinformatics Graduate Program
1
Different data structures
Zimmermann, Niklaus & Steinmann, Katharina.
(2021). A short Introduction to statistics using R.
2
Getting information about an object
x <- c('one'=1,'two'=2,'three'=3,'four'=4,'five’=5) #this is a named vector
typeof(x)
[1] "double"
y <- c('one'=1L,'two'=2L,'three'=3L,'four'=4L,'five'=5L)
typeof(y)
[1] "integer"
length(y)
[1] 5
attributes(y) #this shows the metadata
$names
[1] "one" "two" "three" "four" "five"
x.y <- cbind(x,y) #building more complex structures
str(x.y)
num [1:5, 1:2] 1 2 3 4 5 1 2 3 4 5
- attr(*, "dimnames")=List of 2
..$ : chr [1:5] "one" "two" "three" "four" ...
..$ : chr [1:2] "x" "y"
class(x.y)
[1] "matrix" "array"
typeof(x.y)
[1] "double"
3
Useful functions:
str()
typeof()
class()
length()
attributes()
rownames()
colnames()
Reading Data Into R
>Biomarkers <- read.csv(“bio.csv”, header=TRUE,sep=“,”)
>library(ODBC)
>sheet <- “c:\\Documents and Settings\\user\sheet.xls”
>con <- odbcConnectExcel(sheet)
Can also use library(XML) to read MS Office docs since they are zipped XML files.
>library(RMySQL); drv <- dbDriver(“MySQL”)
>con <- dbConnect(drv,dbname=,user,password=,host=)
>mydata <- dbGetQuery(con, “SELECT * FROM mydata”)
> wpage <- readLines("http://www.programr.com/list.html")
> author_lines <- wpage[grep("<I>", wpage)]
4
Concepts for Using R - Data Frames
# Create a data frame:
> info <- data.frame(gender = c("M", "M", "F"), ht = c(172, 186.5, 165),
wt = c(91, 99, 74))
> info
gender
ht wt
1
M 172.0 91
2
M 186.5 99
3
F 165.0 74
> info[1,2]
[1] 172
> names(info)
[1] "gender" "ht"
"wt"
> info$ht
[1] 172.0 186.5 165.0
> row.names(info) <- c("S1","S2","S3")
> info
gender
ht wt
S1
M 172.0 91
S2
M 186.5 99
S3
F 165.0 74
5
Concepts for Using R - Data Frames
> height <- info$ht
> height
1] 172.0 186.5 165.0
> info$age = c(28,55,43)
> info
gender
ht wt age
S1
M 172.0 91 28
S2
M 186.5 99 55
S3
F 165.0 74 43
> subset(info,age > 50 )
gender
ht wt age
S2
M 186.5 99 55
6
Concepts for Using R - Data Frames
A data frame is an object that contains data in a format that allows for
easier manipulation, reshaping, and open-ended analysis.
Data frames are tightly coupled collections of variables.
# read clinical_trial.txt into R
# name it clinical.trial
clinical.trial <- read.delim("clinical_trial.txt")
Download
from
canvas to
test
NOTE: when copying a path from windows use forward slashes: change
‘C:\Desktop\SYBB402\example.csv’ to ‘C:/Desktop/SYBB402/example.csv’
# use head() and str() functions to investigate clinical.trial
> head(clinical.trial)
> str(clinical.trial)
#select age variable
> clinical.trial$age
> clinical.trial[“age”]
> clinical.trial[2]
> clinical.trial[[2]]
#what is the difference between using [] vs [[]]?
7
Related documents
Download