Short Numpy Tutorial

This is a very short introduction to numpy, focused on the basic data structure, ndarray. Numpy is the most important scientific package in the Python ecosystem because it provides a common datastructure on which many other packages build on.

Python scientific ecosystem

To make this tutorial work on Python 2 & Python 3, let's import some future features into Python 2

In [ ]:
from __future__ import print_function, division
In [ ]:
# np is the standard abbreviation for numpy in the code
# Even the numpy docs use it
import numpy as np

What is an ndarray?

The ndarray is the biggest contribution of numpy. An ndarray is

  • a regular grid of N-dimensions,
  • homogeneous by default (all the elements have the same type),
  • contiguous block of memory with types corresponding to machine types (8-bit ints, 32 bit floats, 64-bit longs, ...).

Building an array (inline)

We can build an array from Python lists:

In [ ]:
arr = np.array([
    [1.2, 2.3, 4.0],
    [1.2, 3.4, 5.2],
    [0.0, 1.0, 1.3],
    [0.0, 1.0, 2e-1]])
print(arr)

Inspecting array properties

In [ ]:
print(arr.dtype)
print(arr.ndim)
print(arr.shape)

This array is of float64 (at least on my computer, probably on yours too), it has 2 dimensions and its shape is 4 rows and 3 columns.

When constructing an array, we can explicitly specify the type:

In [ ]:
iarr = np.array([1,2,3], np.uint8)

Arithmetic operations on the array respect the type and can including rounding and overflow!

In [ ]:
arr *= 2.5
iarr *= 2.5
print(arr)
print(iarr)

Boolean operations

An important subset of operations with numpy arrays concerns using logical operators to build boolean arrays. For example:

In [ ]:
is_greater_one = (arr >= 1.)
print(is_greater_one)

Slicing & Dicing

We can use Python's [] operator to slice and dice the array:

In [ ]:
print(arr[0,0]) # First row, first column
print(arr[1]) # The whole second row
print(arr[:,2]) # The third column

Slices are views

Slices share memory with the original array!

In [ ]:
print("Before: {}".format(arr[1,0]))
view = arr[1]
view[0] += 100
print("After: {}".format(arr[1,0]))

Visual illustration of slicing

In [ ]:
a = np.array([
       [ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

slicing

This image is taken from scipy-lectures, a more complete tutorial on numpy than what we have here.

Basic functions on arrays

In [ ]:
arr.mean()

Also available: max, min, sum, ptp (point-to-point, i.e., difference between maximum and minimum values).

These functions can also work axis-wise:

In [ ]:
arr.mean(axis=0)

An important trick is to combine logical operations with A

In [ ]:
is_greater_one = (arr > 1)
print(is_greater_one.mean())

Broadcasting

You can often perform operations

In [ ]:
print(arr)
print("Now adding [1,1,0] to *every row*")
print()
arr += np.array([1,1,0])
print(arr)

The exact rules of how broadcasting work are a bit complex to explain, but it generally works as expected. For example, if your data is a set of measurements for a sample, and your columns are the different types of measurements, then, you can easily remove the mean like this:

In [ ]:
print(arr.mean(0))
arr -= arr.mean(0)
print(arr.mean(0))

Footnotes

[homogeneous]: There is a loophole to get heterogeneous arrays, namely an array of object. Then, you can store any Python object. This comes at the cost of decreased computational efficiency (both in terms of processing time and memory usage).