Saturday, December 4, 2010

GO (Gene Ontology)


According to the main page:
The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium members, as well as tools to access and process this data.

The idea is for people to browse from the internet, but you can download the ontology relationships (download link). The database consists of entries separated by double newlines which look like:

[Term]
id: GO:0008883
name: glutamyl-tRNA reductase activity
namespace: molecular_function
def: "Catalysis of the reaction: (S)-4-amino-5-oxopentanoate + NADP(+) + tRNA(Glu) = L-glutamyl-tRNA(Glu) + H(+) + NADPH." [EC:1.2.1.70, RHEA:12347]
subset: gosubset_prok
synonym: "L-glutamate-semialdehyde: NADP+ oxidoreductase (L-glutamyl-tRNAGlu-forming)" EXACT [EC:1.2.1.70]
xref: EC:1.2.1.70
xref: KEGG:R04109
xref: MetaCyc:GLUTRNAREDUCT-RXN
xref: RHEA:12347
is_a: GO:0016620 ! oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor

and so on... There's a single header (which doesn't have [ ] in the first line), and a few entries at the end that start with [Typedef]. A format guide is here. (I didn't read it).

What we want to do is follow all the 'is_a' links. When I looked at GO previously, I found some cyclical references. So I put in a test for whether a particular item has been seen before, not unlike the tree-traversal code here. Anyway, the following is the output for this particular target.

We print the details for this item, and then we follow a chain of 'is_a' all the way up to GO:0003674 molecular_function.

GO:0008883
def
"Catalysis of the reaction: (S)-4-amino-5-oxopenta ..
id
GO:0008883
is_a
GO:0016620 ! oxidoreductase activity, acting on th ..
name
glutamyl-tRNA reductase activity
namespace
molecular_function
subset
gosubset_prok
synonym
"L-glutamate-semialdehyde: NADP+ oxidoreductase (L ..
xref
EC:1.2.1.70
KEGG:R04109
MetaCyc:GLUTRNAREDUCT-RXN
RHEA:12347
GO:0008883 glutamyl-tRNA reductase activity
GO:0016620 oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor
GO:0016903 oxidoreductase activity, acting on the aldehyde or oxo group of donors
GO:0016491 oxidoreductase activity
GO:0003824 catalytic activity
GO:0003674 molecular_function

I'll be doing more with this, so I put the zipped project files up on Dropbox (here). I will update the zip at that link as I work more on the project. You will need to add the GO database to the /db folder for it to run. It's about 20 MB.