Friday, August 19, 2011

Learning to use RPy2 (5)


Here is the result of my experiment with the rpy2-bioconductor-extensions-0.2.


import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
import bioc.biobase as biobase
r = robjects.r
g = robjects.globalenv

importr('ALL')
r('data("ALL")')
data = g['ALL']
eset = biobase.ExpressionSet(data)
print eset

Run as a script.

> python script.py

First (above) section output:

ExpressionSet (storageMode: lockedEnvironment)
assayData: 12625 features, 128 samples
element names: exprs
protocolData: none
phenoData
sampleNames: 01005 01010 ... LAL4 (128 total)
varLabels: cod diagnosis ... date last seen (21 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
pubMedIds: 14684422 16243790
Annotation: hgu95av2

Part 2:

expr_data = eset.exprs
print len(expr_data)
L = tuple(expr_data[:5])
print [round(t,3) for t in L]

output:

1616000
[7.597, 5.046, 3.9, 5.904, 5.925]

Part 3:

print list(eset.slotnames())
phenoData = eset.do_slot('phenoData')
print 'phenoData', id(phenoData)
print phenoData

output:

['experimentData', 'assayData', 'phenoData', 'featureData', 'annotation', 'protocolData', '.__classVersion__']
phenoData 0x10bbc35a8
An object of class "AnnotatedDataFrame"
sampleNames: 01005 01010 ... LAL4 (128 total)
varLabels: cod diagnosis ... date last seen (21 total)
varMetadata: labelDescription

Part 4:

pd = eset.get_phenodata()
print 'pd', id(pd)
print pd
print list(pd.slotnames())
print

output:

pd 0x10bbc3488
An object of class "AnnotatedDataFrame"
sampleNames: 01005 01010 ... LAL4 (128 total)
varLabels: cod diagnosis ... date last seen (21 total)
varMetadata: labelDescription

['varMetadata', 'data', 'dimLabels', '.__classVersion__']

Part 5:

data = pd.do_slot('data')
print 'data'
print type(data)
mb = pd.pdata.rx2('mol.biol')
print list(mb)[:10]
levels = r['levels']
print levels(mb)
print list(data.rx2('mol.biol'))[:10]

output:

data
<class 'rpy2.robjects.vectors.DataFrame'>
[2, 4, 2, 1, 4, 4, 4, 4, 4, 2]
[1] "ALL1/AF4" "BCR/ABL" "E2A/PBX1" "NEG" "NUP-98" "p15/p16"

[2, 4, 2, 1, 4, 4, 4, 4, 4, 2]

Part 6:

fd = eset.get_featuredata
print fd
# fd() gives AttributeError, no _featureData
#
featureNames = robjects.r['featureNames']
fn = featureNames(eset)
print len(fn), fn[0]

output:

<bound method ExpressionSet.get_featuredata of >
12625 1000_at

In the last part, the "new" way doesn't seem to work yet. So I did as before.
Well, that's it for Bioconductor and Rpy2 for now. The presentation could be a lot better organized, but at least we figured out how to do most things.