Friday, March 4, 2011

Handling large sequence sets (6)

It's the end of a long day of coding, and it's also time to get back to real work. So, I'm just going to mention (once again) that I put all the scripts for this project in a zip file on Dropbox (here). You will need to download the data files and other stuff to make it all work. (Details are in all the other posts, and summarized in a README file).

The plots are quite striking. Below are two: one gene for which mutants survive selection in the lung (hemR), and one that's required (rfaF).





I computed statistics in order to recreate Fig 2 from the paper, but there's something wrong with my code. At least, my plot looks different because I am not getting the same values for the survival index as they did. Rather than post something that's probably incorrect, I won't show that. I'm suspicious that the BLAST step is an issue, because there are a number of assignments to positions that are not TA dinucleotides, although the positions with larger numbers of observed hits do tend to be of this type.

But I will show the results filtered for likely genes involved in survival or growth in the lung. If you know anything about bacterial physiology (or read the paper), the suspects on the list will not surprise you. Note particularly dam, ftsEKX, hel, polA, recABC, ruvABC and xerD, genes needed for LPS like galU, galE, and rfaF, and genes involved in amino acid and purine synthesis such as (aspC, guAB, ilvE, metC, purB, serAB, thiL, trpAC), and finally, Mike Cashel's favorite, relA. But probably the really interesting ones are the genes with "HI" designations.

I still want to look at tools like BLAST, BLAT, SOAP, uclust and bowtie for the genome position assignment step. That'll have to be some other time.


apaH        828  0.74   28  158   1  1 0.01
aspC 1191 0.60 33 282 2 2 0.01
atpD 1374 0.57 46 174 2 2 0.01
dam 861 0.53 25 98 0 0 0.00
degS 1023 0.25 14 59 1 1 0.02
dppA 1650 0.41 37 161 0 0 0.00
dppC 888 0.30 16 30 0 0 0.00
dsbB 534 0.61 17 148 1 1 0.01
ftsE 657 0.69 22 210 3 4 0.02
ftsK 2770 0.33 50 149 0 0 0.00
ftsX 933 0.66 33 496 3 5 0.01
galE 1017 0.57 36 717 4 7 0.01
galU 888 0.68 26 306 1 1 0.00
genX 972 0.54 25 101 0 0 0.00
glnB 339 0.53 8 57 0 0 0.00
gmhA 585 0.56 14 95 0 0 0.00
guaA 1572 0.29 21 62 0 0 0.00
guaB 1467 0.48 36 251 2 2 0.01
hel 825 0.66 27 469 2 4 0.01
HI0066 1299 0.45 35 226 1 1 0.00
HI0188 771 0.67 32 497 6 8 0.02
HI0261 1044 0.53 29 197 0 0 0.00
HI0286 1215 0.67 38 463 3 6 0.01
HI0290 2169 0.60 58 723 4 6 0.01
HI0310 288 0.53 8 42 0 0 0.00
HI0407 786 0.53 29 99 2 2 0.02
HI0523 1041 0.43 20 254 0 0 0.00
HI0572 726 0.45 15 127 0 0 0.00
HI0621.1 555 0.71 22 119 0 0 0.00
HI0706 1218 0.68 44 283 0 0 0.00
HI0847 336 0.43 6 37 0 0 0.00
HI0854 762 0.60 27 200 0 0 0.00
HI0857 303 0.50 7 156 0 0 0.00
HI0883 1371 0.23 18 30 0 0 0.00
HI1086 786 0.56 19 95 0 0 0.00
HI1087 795 0.63 22 165 1 1 0.01
HI1101 987 0.31 15 53 0 0 0.00
HI1146 858 0.63 31 228 2 2 0.01
HI1658 582 0.62 16 101 1 1 0.01
HI1665 123 0.83 5 67 1 1 0.01
HI1696 885 0.39 19 67 0 0 0.00
HI1699 915 0.36 24 112 0 0 0.00
HI1700 1206 0.24 21 41 0 0 0.00
ilvE 1032 0.57 27 168 2 2 0.01
kpsF 1014 0.15 7 36 0 0 0.00
metC 1191 0.59 43 437 5 7 0.02
mltC 1074 0.83 52 550 0 0 0.00
moaC 483 0.36 8 35 0 0 0.00
mraW 966 0.25 12 68 0 0 0.00
mreB 1056 0.45 26 119 1 1 0.01
mreD 489 0.58 22 123 1 1 0.01
mrp 1113 0.55 28 120 1 2 0.02
nagB 813 0.52 24 113 0 0 0.00
pbp2 1956 0.52 68 353 0 0 0.00
polA 2793 0.10 12 72 1 1 0.01
por 618 0.55 17 133 0 0 0.00
potB 861 0.33 19 44 0 0 0.00
purB 1371 0.59 40 290 4 4 0.01
recA 1065 0.41 20 59 1 1 0.02
recB 3636 0.29 55 224 0 0 0.00
recC 3366 0.48 75 469 2 3 0.01
relA 2232 0.56 62 497 4 5 0.01
rfaD 927 0.53 23 147 0 0 0.00
rfaE 1431 0.50 27 132 0 0 0.00
rfaF 1041 0.53 21 126 0 0 0.00
rnc 684 0.40 14 57 0 0 0.00
rodA 1116 0.43 26 254 0 0 0.00
rpiA 660 0.66 21 110 1 1 0.01
ruvA 615 0.54 15 59 0 0 0.00
ruvB 1008 0.72 34 221 1 1 0.01
ruvC 573 0.43 10 63 0 0 0.00
serA 1233 0.64 37 403 0 0 0.00
serB 945 0.52 26 225 0 0 0.00
sspA 639 0.53 17 54 0 0 0.00
talB 954 0.24 12 34 0 0 0.00
thiL 1038 0.41 20 54 1 1 0.02
trmE 1386 0.19 13 37 0 0 0.00
trpA 807 0.45 18 121 2 2 0.02
trpC 1434 0.54 36 215 3 4 0.02
truA 810 0.53 16 59 1 1 0.02
tyrA 1134 0.49 29 332 2 4 0.01
vacJ 753 0.71 27 751 8 15 0.02
xerD 894 0.54 19 56 0 0 0.00
yfeA 882 0.40 18 63 0 0 0.00
yfeD 816 0.36 16 49 0 0 0.00
1677
85