Wednesday, May 7, 2008

Python for engineering restriction sites

Sometimes one needs to mutate a segment of DNA to introduce a new restriction site, with the requirement that the change be silent. I'll show here a program I wrote a few years ago to do that. At the time, when I searched, I could not find such a program available on the web.

The first thing we need is a list of restriction enzymes and their sites. You can get lists from New England Biolabs and others in various formats (which typically have to be massaged a bit to be useful). I used a module from Biopython called Restriction_Dictionary. I wrote a module to filter and reformat this information the way I wanted it. I also used a second module of my own called GeneticCode2. Since these modules need to be imported by the main script they are required to have the '.py' file extension. So if you decide to try out this example, be sure to change the filenames appropriately. As usual, I have posted '.txt' files so they'll simply show up in your browser.

The GeneticCode file is interesting in its own right. It makes the dictionary to hold the Genetic Code in just a few lines.



However, because it's done by code I needed to proofread it carefully, and to do that, I needed to get it to print out as the standard code is displayed, and to do that, I had to write my own comparison function. It's worth a look.

The main script is called extrasites. The way it works is to go through the gene codon by codon, changing each codon to all of its synonymous codons, then evaluating whether there is a new restriction site. It's restricted to 6-cutters because they are the most useful. Therefore, we evaluate the potential mutants in the context of two flanking codons both upstream and downstream to cover the full 6 bp.

Here is part of a run for the Salmonella typhimurium hemA gene:

codon 8 AAC => AAT
GGTATT AAC CATAAA
GGTATT AAT CATAAA
VspI ATTAAT

codon 12 GCA => GCG
AAAACG GCA CCTGTA
AAAACG GCG CCTGTA
KasI GGCGCC

[snip]

codon 416 CTG => CTT
CTCGGG CTG GAGTAG
CTCGGG CTT GAGTAG
BpuEI CTTGAG

codon 416 CTG => CTC
CTCGGG CTG GAGTAG
CTCGGG CTC GAGTAG
PaeR7I CTCGAG

time elapsed = 0.065 seconds