Attributes are arbitrary labels attached to R objects.
Can be set by hand, but should rarely be done by hand.
Example of attributes:
x <- runif(10)
names(x) <- LETTERS[1:10] # or letters for small letters
x
mode = simplified version of type. -> everything that is a number (e.g. integer, double, etc.) is called numeric, everything that is a function (e.g. builtin, closure, special) a function -> can be useful to perform check (e.g. to test if the variable is a number and you don't care if it is integer or double) closure = function in R
# mode as simplified version of type (upper layer of type)
mode
vectors can only take values from the same type, will convert data from different types to the upper type (e.g. integer converted to numeric)
# numeric and sttring
c(1,"a",2)
#=> convert to character
# numeric, string and function
c(1, "a", c)
# => convert to list, as it allows elements of different types
data <- rnorm(1000000)
# how many are about 0 ?
sum(data > 0)
sum(data)/length(data)
# but can also be done with logical
# => for counting the proportion -> mean of logical
mean(data > 0)
In contrast to other languages, logical and numeric cannot be freely exchanged
vect <- 1:10
vect[c(0,1)]
# => with numeric, just retrieves element of corresponding index
vect[c(TRUE, FALSE)]
# => with logical, recycling
Main data type = vector, one of the only data types
Matrices and arrays are extension of vectors, where 2 or more dimension are specified
m <- matrix(1:30, ncol=6)
# the square brackets are function => we can add arguments after the coma
m[11]
m[1,3]
dim(m)
length(m)
a <- matrix(1:30)
# stores it as a vector
# class and dimension that differ
mylist[1]
# => return a sublist, is still a list; returns part of the list
mylist[[1]]
# => get what is inside the list; returns what is inside this element
More flexible than matrices, can have different types in it the class is data frame, but typeof (=> low-level of the data) is list
As long as unambiguous, can use name shortcut, but don't do that !
summary() => summary of what the object contains; but the function str() is in most cases more useful
anova with numeric -> will do a linear regression instead of an ANOVA output with only 1 row => just 2 groups or linear regression
factors = vectors but constraints on the values they can contain
internally: integer numbers, 1st category = 1, 2nd category = 2
concatenating factors will not work -> will create a vector with numberic values
hair1 <- factor(c("red", "brown", "blue"), levels = c("red", "brown", "blue"))
hair2 <- factor(c("white", "green", "grey"), levels = c("white", "grey", "green"))
c(hair1, hair2)
# for concatenating factors:
unlist(list(hair1, hair2))
# ... or:
c(as.character(hair1), as.character(hair2))
#ordered = TRUE => can be used for comparison
time <- factor(c("sometimes", "always", "never"), levels=c("never", "sometimes", "always"), ordered=T)
time[2] < time[3]
In previous versions of R: more memory efficient to use factors (e.g. store only 1, 2, etc. instead of the long string). But is not more a reason, because now stores only once each occurence of a string in a vector.
S3 => what is used in the base R packages
S4 => have to specifically say that we create a new object (everything in bioconductor uses S4)
Objects and function: example of summary(): summary() is defined as a generic function, what the function does depends on the class of the object.
R will look for a summary.
methods("summary") => list of all the summary functions implemented for the different classes
methods("summary")
# to see the body of non-visible functions:
getS3method("summary", "princomp")
getAnywhere("summary.princomp")
Another exemple of generic functions: print(): if not implemented, just the content of the object that is printed
# create a list containing all the attributes your object need
# (any R object could be used, but lists are almost always used)
mygsea2 <- function(small.list, big.list) {
#[...]
#z <- list(ks.pos = res, [...])
z <- list()
class(z) <- "gsea"
}
# lists most often used for their flexibility
# an object can have multiple classes !!!
# => test with
# if(!any(class(object) == "gsea"))
# => ... and not
# if(class(object) == "gsea"))
# create a function to print information about the object
print.gsea <- function(){}
# the dispatch is not needed if the function already exists in R (e.g. print, plot, etc.)
# for print() -> dispatch not needed
reduce.gsea <- function()
# Dispatch: tell R the function is implemented for this class
reduce <- function(object) UseMethod("reduce")
# !!! to access object elements -> use implemented functions, not directly list elements !!!
If the method we create for the class does not already exist in R, we need to do the dispatch. If the function already exists (print, plot, etc.), the dispatch is not needed.
# to get all the generic methods implemented for a class:
methods(class="lm")
We have to explicitly say what the class should contain.
Validation function that will refuse to create an object if not fullfilled. -> useful to create object to do quality control of the data we are using !
Slots = attributes of the object
showMethods() => to see the methods defined for an object
! do not access object attributes with the @ but use functions that provide access to the slots !
showMethods("show")
showMethods(class="GSEA")
S3 to interface with base functions (e.g. linear models, etc.) For bioconductor, use S4. S4 format fits with the format of C++.
names(object) => empty for S4 object issS4(object) => TRUE for S4 object
methods(class = "class") => for S3 showMethods(class="class") => for S4
method.class => for S3 getMethods("method", "class") => for S4
$ => for S3 @ => for S4
Another class: ReferenceClasses (R5)