Saturday, April 10, 2010

Chi-squared revisited


I've posted a few times about the chi-square (or -squared) distribution, for example its use to test the distribution of grades between males and females as described in Grinstead & Snell (here). I was also interested more generally in situations where this distribution comes up, for example, when multiplying two random variables that are normally distributed (here).

Probably the simplest example is of rolling dice. If an n-sided die is rolled many times, then the distribution of the counts for each of the n-sides should take on a chi-squared distribution (with df = n-1). I decided to simulate that in Python, with the results as shown in the graphic above. It looks pretty good to me.

Here is the code. Again, I use the Counter class from here.

import random, math
from random import choice
import numpy as np
import matplotlib.pyplot as plt
import Counter
random.seed(157)

class OneDie:
def __init__(self,nsides=6):
self.nsides = 6
self.R = range(1,nsides+1)
def nrolls(self,N=100):
return [choice(self.R) for i in range(N)]

def chisquare_stat(N,D):
stat = 0
e = 1.0 * N / len(D.keys())
for k in D:
o = D[k]
stat += (o-e)**2/e
return stat

n = 6
d = OneDie(n)
L = list()
N = 100
for i in range(10000):
C = Counter.Counter(d.nrolls(N))
s = chisquare_stat(N,C)
L.append(s)

# fractional bins for the histogram
width = 0.4
L = [np.floor(s/width) * width for s in L]
C = Counter.Counter(L)

ax = plt.axes()
maxCount = C.most_common(1)[0][1]
maxValue = np.max(C.keys())
ax.set_xlim(-1,maxValue+1)
ax.set_ylim(0,maxCount+5)

# go ahead and plot
for value in C:
x = value
r = plt.Rectangle((x,0),width=width,
height=C[value],
facecolor='green')
ax.add_patch(r)

# chi-square should approximate gamma
@np.vectorize
def gamma_pdf(x,k=5):
return x**(k/2.0-1) * math.e**(-x/2.0)
X = np.arange(0,maxValue,0.01)
Y = gamma_pdf(X,n-1)

plt.plot(X,600*Y,'--',
color='r',lw=6)
plt.savefig('example.png')