Friday, February 5, 2010

Comparing R and Python sequences

This is a post about elementary sequence operations in R and Python. It's as much for me as for you.

The most obvious difference between sequences in R and Python is Python's use of 0-based indexing:

R:
> 1:5
[1] 1 2 3 4 5
> A = seq(1,10,by=2)
> A
[1] 1 3 5 7 9
> A[2]
[1] 3


Python:
>>> range(1,6)
[1, 2, 3, 4, 5]
>>> A = range(1,11,2)
>>> A
[1, 3, 5, 7, 9]
>>> A[1]
3


Another difference is that in R, but not in Python, one can assign to an index outside the initial range:

> m = 1:2
> m[6] = 35
> m
[1] 1 2 NA NA NA 35


>>> m = range(1,3)
>>> m[6] = 35
Traceback (most recent call last):
File "", line 1, in
IndexError: list assignment index out of range


In R, but not in regular Python, we can make the increments non-integral values:

> A = seq(0,20,by=0.1)
> A[1]
[1] 0
> length(A)
[1] 201
> A[length(A)]
[1] 20


We can use numpy to get around this restriction:

>>> import numpy as np
>>> A = np.arange(0,20.1,0.1)
>>> A[0]
0.0
>>> len(A)
201
>>> A[-1]
20.0


It's sometimes more convenient to specify how many numbers we want to obtain (evenly spaced in some interval):

> A = seq(0,2,length=6)
> A
[1] 0.0 0.4 0.8 1.2 1.6 2.0


>>> A = np.linspace(0,2,6)
>>> A
array([ 0. , 0.4, 0.8, 1.2, 1.6, 2. ])


Vectorized operations:

> m = 1:9
> dim(m) = c(3,3)
> m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m = t(m)
> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> apply(m,1,mean)
[1] 2 5 8
> apply(m,2,mean)
[1] 4 5 6
> mean(m)
[1] 5


>>> m = np.arange(1,10)
>>> m.shape = (3,3)
>>> m
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> np.mean(m, axis=0)
array([ 4., 5., 6.])
>>> np.mean(m, axis=1)
array([ 2., 5., 8.])
>>> np.mean(m)
5.0


Here are some examples of fancy indexing where we rearrange rows and columns both at the same time:

> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> m[c(2,3,1),c(3,2,1)]
[,1] [,2] [,3]
[1,] 6 5 4
[2,] 9 8 7
[3,] 3 2 1


The naive implementation in Python gives something different than in R (though useful), but the sequential approach works:

>>> m
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> m[[1,2,0],[2,1,0]]
array([6, 8, 1])
>>> m[[1,2,0],:][:,[2,1,0]]
array([[6, 5, 4],
[9, 8, 7],
[3, 2, 1]])


R has a few more indexing tricks for which I don't know if there is a Python equivalent:

> m = 1:9
> dim(m) = c(3,3)
> m = t(m)
> m[-1,]
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 7 8 9

> sel = m[1,] > 2
> sel
[1] FALSE FALSE TRUE
> m[sel]
[1] 7 8 9

> y = -5:5
> y
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> y[y < 0] <- -y[y < 0]
> y
[1] 5 4 3 2 1 0 1 2 3 4 5
> y <- abs(y)
> y
[1] 5 4 3 2 1 0 1 2 3 4 5


But here, finally is an example we can do in both:

> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> sel = array(c(1:3,3:1), dim=c(3,2))
> sel
[,1] [,2]
[1,] 1 3
[2,] 2 2
[3,] 3 1
> m[sel] = 0
> m
[,1] [,2] [,3]
[1,] 1 2 0
[2,] 4 0 6
[3,] 0 8 9


>>> m
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> m[[0,1,2],[2,1,0]] = 0
>>> m
array([[1, 2, 0],
[4, 0, 6],
[0, 8, 9]])