better to use the sqrt function

In [4]:
library(microbenchmark)
set.seed(1)
x <- runif(100)

#sqrt(x)
#x^0.5

microbenchmark(sqrt(x), x^0.5)

# last "cld" column (provide ranking) available only if other package installed
exprtime
sqrt(x) 3271
sqrt(x) 649
x^0.5 20648
x^0.5 8490
sqrt(x) 570
x^0.5 8289
x^0.5 8337
sqrt(x) 698
x^0.5 8309
sqrt(x) 459
x^0.5 8246
sqrt(x) 832
x^0.5 8214
sqrt(x) 562
x^0.5 8103
sqrt(x) 460
x^0.5 8259
sqrt(x) 563
x^0.5 8353
x^0.5 8214
sqrt(x) 627
sqrt(x) 537
sqrt(x) 568
sqrt(x) 533
sqrt(x) 629
x^0.5 8416
x^0.5 8272
x^0.5 8257
x^0.5 8259
x^0.5 8238
x^0.5 8073
sqrt(x) 626
x^0.5 8138
x^0.5 8209
x^0.5 8054
x^0.5 8238
sqrt(x) 448
sqrt(x) 542
x^0.5 8207
sqrt(x) 541
sqrt(x) 431
sqrt(x) 583
sqrt(x) 416
x^0.5 8232
x^0.5 8123
x^0.5 8259
x^0.5 8087
sqrt(x) 633
sqrt(x) 410
sqrt(x) 558
x^0.5 8150
sqrt(x) 564
x^0.5 8102
sqrt(x) 579
sqrt(x) 448
x^0.5 8286
x^0.5 8077
sqrt(x) 653
x^0.5 8158
sqrt(x) 570

other command: system.time()

read matrix col then row, vs. row then col

pre-fetching: if you start reading the first element, it is very likely that you will read the adjacent one, in parallel the processor is already getting the next one => if you go by column: very efficient, you benefit from that

In [ ]:
Rprof()
#[... code ...]
Rprof(NULL)

you can have amount of time spent in each function percentages do not add to 100%

garbage collector

In [5]:
gc()
used(Mb)gc trigger(Mb)max used(Mb)
Ncells 56989030.5 94048050.3 94048050.3
Vcells1001300 7.7 192196714.7 129951010.0

Map a function to different group: tapply() does not apply to matrices, apply to 2 vectors 1) the one to which apply the function 2) the grouping variable

tapply(data$height, data$sex, FUN=MEAN)

=> mean of height stratify for the sex

useful functions

  • for converting from width to short format: stack() function

  • for combining data frames merge() function

  • which element of a list present in a second list match() for each element of the first list, returns its index in the second list (if present multiple times, returns only the 1st if just want to know if they are present (not interested in the position): %in%

In [ ]:
some functions are hidden, because not supposed to use them
retrieve their source with getAnywhere
In [1]:
getAnywhere(format.pval)
A single object matching ‘format.pval’ was found
It was found in the following places
  package:base
  registered S3 method for format from namespace base
  namespace:base
with value

function (pv, digits = max(1L, getOption("digits") - 2L), eps = .Machine$double.eps, 
    na.form = "NA", ...) 
{
    if ((has.na <- any(ina <- is.na(pv)))) 
        pv <- pv[!ina]
    r <- character(length(is0 <- pv < eps))
    if (any(!is0)) {
        rr <- pv <- pv[!is0]
        expo <- floor(log10(ifelse(pv > 0, pv, 1e-50)))
        fixp <- expo >= -3 | (expo == -4 & digits > 1)
        if (any(fixp)) 
            rr[fixp] <- format(pv[fixp], digits = digits, ...)
        if (any(!fixp)) 
            rr[!fixp] <- format(pv[!fixp], digits = digits, ...)
        r[!is0] <- rr
    }
    if (any(is0)) {
        digits <- max(1L, digits - 2L)
        if (any(!is0)) {
            nc <- max(nchar(rr, type = "w"))
            if (digits > 1L && digits + 6L > nc) 
                digits <- max(1L, nc - 7L)
            sep <- if (digits == 1L && nc <= 6L) 
                ""
            else " "
        }
        else sep <- if (digits == 1) 
            ""
        else " "
        r[is0] <- paste("<", format(eps, digits = digits, ...), 
            sep = sep)
    }
    if (has.na) {
        rok <- r
        r <- character(length(ina))
        r[!ina] <- rok
        r[ina] <- na.form
    }
    r
}
<bytecode: 0x34daf78>
<environment: namespace:base>

<<- => double arrow to use to force assignment to work on global variable, not on local variable

In [ ]:
sample(1:100, 10, replace=T)

if we are really sure of the values we pass to the mean function, we can directly use .Internal(mean(x)) => this is the function called by mean.default() after having all the checks