better to use the sqrt function

library(microbenchmark)
set.seed(1)
x <- runif(100)

#sqrt(x)
#x^0.5

microbenchmark(sqrt(x), x^0.5)

# last "cld" column (provide ranking) available only if other package installed

other command: system.time()

read matrix col then row, vs. row then col¶

pre-fetching: if you start reading the first element, it is very likely that you will read the adjacent one, in parallel the processor is already getting the next one => if you go by column: very efficient, you benefit from that

Rprof()
#[... code ...]
Rprof(NULL)

you can have amount of time spent in each function percentages do not add to 100%

garbage collector

gc()

Map a function to different group: tapply() does not apply to matrices, apply to 2 vectors 1) the one to which apply the function 2) the grouping variable

tapply(data$height, data$sex, FUN=MEAN)

=> mean of height stratify for the sex

useful functions¶

for converting from width to short format: stack() function
for combining data frames merge() function
which element of a list present in a second list match() for each element of the first list, returns its index in the second list (if present multiple times, returns only the 1st if just want to know if they are present (not interested in the position): %in%

some functions are hidden, because not supposed to use them
retrieve their source with getAnywhere

getAnywhere(format.pval)

A single object matching ‘format.pval’ was found
It was found in the following places
  package:base
  registered S3 method for format from namespace base
  namespace:base
with value

function (pv, digits = max(1L, getOption("digits") - 2L), eps = .Machine$double.eps, 
    na.form = "NA", ...) 
{
    if ((has.na <- any(ina <- is.na(pv)))) 
        pv <- pv[!ina]
    r <- character(length(is0 <- pv < eps))
    if (any(!is0)) {
        rr <- pv <- pv[!is0]
        expo <- floor(log10(ifelse(pv > 0, pv, 1e-50)))
        fixp <- expo >= -3 | (expo == -4 & digits > 1)
        if (any(fixp)) 
            rr[fixp] <- format(pv[fixp], digits = digits, ...)
        if (any(!fixp)) 
            rr[!fixp] <- format(pv[!fixp], digits = digits, ...)
        r[!is0] <- rr
    }
    if (any(is0)) {
        digits <- max(1L, digits - 2L)
        if (any(!is0)) {
            nc <- max(nchar(rr, type = "w"))
            if (digits > 1L && digits + 6L > nc) 
                digits <- max(1L, nc - 7L)
            sep <- if (digits == 1L && nc <= 6L) 
                ""
            else " "
        }
        else sep <- if (digits == 1) 
            ""
        else " "
        r[is0] <- paste("<", format(eps, digits = digits, ...), 
            sep = sep)
    }
    if (has.na) {
        rok <- r
        r <- character(length(ina))
        r[!ina] <- rok
        r[ina] <- na.form
    }
    r
}
<bytecode: 0x34daf78>
<environment: namespace:base>

<<- => double arrow to use to force assignment to work on global variable, not on local variable

sample(1:100, 10, replace=T)

if we are really sure of the values we pass to the mean function, we can directly use .Internal(mean(x)) => this is the function called by mean.default() after having all the checks

expr	time
sqrt(x)	3271
sqrt(x)	649
x^0.5	20648
x^0.5	8490
sqrt(x)	570
x^0.5	8289
x^0.5	8337
sqrt(x)	698
x^0.5	8309
sqrt(x)	459
x^0.5	8246
sqrt(x)	832
x^0.5	8214
sqrt(x)	562
x^0.5	8103
sqrt(x)	460
x^0.5	8259
sqrt(x)	563
x^0.5	8353
x^0.5	8214
sqrt(x)	627
sqrt(x)	537
sqrt(x)	568
sqrt(x)	533
sqrt(x)	629
x^0.5	8416
x^0.5	8272
x^0.5	8257
x^0.5	8259
x^0.5	8238
⋮	⋮
x^0.5	8073
sqrt(x)	626
x^0.5	8138
x^0.5	8209
x^0.5	8054
x^0.5	8238
sqrt(x)	448
sqrt(x)	542
x^0.5	8207
sqrt(x)	541
sqrt(x)	431
sqrt(x)	583
sqrt(x)	416
x^0.5	8232
x^0.5	8123
x^0.5	8259
x^0.5	8087
sqrt(x)	633
sqrt(x)	410
sqrt(x)	558
x^0.5	8150
sqrt(x)	564
x^0.5	8102
sqrt(x)	579
sqrt(x)	448
x^0.5	8286
x^0.5	8077
sqrt(x)	653
x^0.5	8158
sqrt(x)	570

	used	(Mb)	gc trigger	(Mb)	max used	(Mb)
Ncells	569890	30.5	940480	50.3	940480	50.3
Vcells	1001300	7.7	1921967	14.7	1299510	10.0