Sunday, November 15, 2009

Python bytes

About ten days ago I posted about looking at the passwords stored in /etc/xgrid/agent/controller-password and similar files.

The code I put up is rather silly. It uses a hard-coded dict to translate bytes (in string rep) to hexadecimal. This is not worth spending too much time on, especially since Python 3 has a whole 'nother attitude about strings and bytes, but I thought I would at least show a simple and more correct (I hope) Python 2 approach to this issue.

So, of course we have bits and bytes on the machine, and strings exist only on-screen or paper. We can represent bits and bytes as integers, or as chars, and vice-versa. I'm sure everyone knows we can go from int to chr and back again:


>>> chr(78)
'N'
>>> ord('P')
80


My understanding is that we should view integers as the natural intermediate form for conversion of bits and bytes from base 2 to other bases.


>>> bin(15)
'0b1111'


In this representation the binary number 1111 is an int (15) or its string representation ('0b1111'). Python also has string reps for hexadecimal and octal:


>>> hex(15)
'0xf'
>>> oct(15)
'017'


We can go from binary or hex back to int, but we need to specify the base:


>>> int('0b1111',2)
15
>>> int('0xf',16)
15


We don't actually need the leading '0x' or '0':

>>> int('f',16)
15
>>> int('17',8)
15


So, the other day I should have just done:


>>> bin(int('0xf',16))
'0b1111'


When reading data from a file:


FH = open('script.py','rb')
data = FH.read(8)
FH.close()

print type(data)
print len(data)
print data



<type 'str'>
8
from bin


Although the file was opened in "binary" mode, the type actually read was <'str'>, and when the data are printed, it looks like a string. Nevertheless, the data do respond well to a function that operates on binary data and converts it to a hexadecimal string representation.


from binascii import *
L = [b2a_hex(b) for b in data]
print L
L = [int(h,16) for h in L]
print L
print [chr(i) for i in L]



['66', '72', '6f', '6d', '20', '62', '69', '6e']
[102, 114, 111, 109, 32, 98, 105, 110]
['f', 'r', 'o', 'm', ' ', 'b', 'i', 'n']


The result is rather different if we use the same function on the data as a whole:


L = b2a_hex(data)
print len(L)
for i in range(0,len(L),2):
h = L[i:i+2]
print h,
print chr(int(h,16))



16
66726f6d2062696e
66 f
72 r
6f o
6d m
20
62 b
69 i
6e n


In this case, the 8 bytes are converted to 16 hexadecimal characters, and to do the conversion to ints and chars we must read 2 char chunks of the hexadecimal.

Does that make sense?