Wednesday, August 12, 2009

How fast does it grow?


Even allowing for mismatches, sequence alignment would be easy except for gaps. But it quickly becomes impossible to compare all alignments if we allow them. According to Durbin, the number of possible alignments of 2 sequences of length n with gaps corresponds to the number of ways of picking n objects from a set of 2n objects.

The number of combinations is:

(2n)! / (n!)*(n!)

Using Stirling's approximation

n! ≈ √(2πn) (n/e)n

Durbin says this reduces to:

22n / √(πn)

We can get an idea of how fast these numbers grow using Python:

def f(N): return 2.0**(2*N) / (3.1416*N)**0.5
for i in range(30,151,30): print i, round(f(i),2)


30 1.18758216929e+17
60 9.68162841804e+34
90 9.11386351844e+52
120 9.09982189426e+70
150 9.38377585615e+88


This brings up the general question of comparing how quickly different functions grow. You've probably heard of big-O notation---O(n).

Here is video of Eric Demaine's MIT open courseware lecture on this topic. And here is a nice post analyzing the video.

Finally, here is the R code for the plot above:

X = 100
Y = 1000
xlim=c(0,X)
ylim=c(0,Y)

v = c('4','5','6')

plot(1:100,xlim=xlim,
ylim=ylim,type='n')
curve(log(x),from=0,to=X,
col='blue',lwd=3,add=T)
curve(1*x,from=0,to=X,
col='magenta',lwd=3,add=T)
curve(x*log(x),from=0,to=X,
col='darkred',lwd=3,add=T)
curve(x**2,from=0,to=X,
col='black',lwd=3,add=T)
curve(x**3,from=0,to=X,
col='green',lwd=3,add=T)
curve(2**x,from=0,to=X,
col='red',lwd=3,add=T)
for (i in 4:6) {
points(i,factorial(i),
pch=16,cex=1.5) }


Let's postpone consideration of these points until later:
• how to get the result above using Stirling's approximation
• O(n) analysis
• justification for the O(n) analysis of various algorithms as shown in the figure.