Note: This is second part of a multi-part post and builds upon the work done in the first post.
In the last post, we took a block .dat file and analyzed it using hexdump. In the process, we learned the structure of a Bitcoin block. We also looked at the structure of transactions present in the block and analyzed the first transaction, which is also known as the coinbase transaction.
Today we put our learning into practice by building a basic block parser in python. The block parser would take the name of a binary file containing a block as input, build a Block object and print a pretty parsed version of the block.
Let’s call our parser block.py. We will be working with the file blk00003.dat that we began using in the previous post. So, when invoking the parser our command would look something like this:
python block.py blk00003.dat
First, fire up your favorite code editor and save an empty python file in the directory (e.g.~/coinlogic/) where you copied blk00003.dat.
Once open, paste this as the contents of the file:
def parseBlockFile(blockfile): print 'Parsing block file: %s' % blockfile if __name__ == "__main__": import sys usage = "Usage: python {0} " if len(sys.argv) < 2: print usage.format(sys.argv[0]) else: parseBlockFile(sys.argv[1])
I have also uploaded this version of file on github here. When you run this :
coinlogic.info@proto $>python block.py blk00003.dat Parsing block file: blk00003.dat
Great! Now that we have our file setup. Let’s start creating the block parser. We will create a class called Block to represent the block. We will create a method of this class called parseBlockFile() that takes the block file name as a parameter and parses and prints it.
Find this version of block.py on github as well
class Block(object): """A block to be parsed from file""" def __init__(self): self.magic_no = -1 self.blocksize = 0 self.blockheader = None transaction_cnt = 0 transactions = None blockfile = None def parseBlockFile(self, blockfile): print 'Parsing block file: %s' % blockfile def parseBlockFile(blockfile): block = Block() block.parseBlockFile(blockfile) if __name__ == "__main__": import sys usage = "Usage: python {0} " if len(sys.argv) < 2: print usage.format(sys.argv[0]) else: parseBlockFile(sys.argv[1])
Running it, we get:
coinlogic.info@proto $>python block.py blk00003.dat Parsing block file: blk00003.dat
Magic number
Now, we are in a position to start parsing the block. We will open the file as binary and read it one byte at a time. The first item is a 4 byte magic number as little-endian integer. We can use python’s struct module to convert this 4 byte little-endian representation into a python integer such as this:
struct.unpack('I', f.read(4))[0]
Since, we will be reading integers of 1,2,4 and 8 bytes we can create some utility functions that will help us read these values. These functions can all accept a generic file or stream object that has a read function. Here is what these functions might look like:
def read_uint1(stream): return ord(stream.read(1)) def read_uint2(stream): return struct.unpack('H', stream.read(2))[0] def read_uint4(stream): return struct.unpack('I', stream.read(4))[0] def read_uint8(stream): return struct.unpack('Q', stream.read(8))[0]
Now that we have a function that can read a 4 byte integer, we are ready to read our magic number. Let’s open the file as binary by passing ‘rb’ as mode to open() and pass it to our utility function to read the integer.
The updated Block class looks like this:
class Block(object): """A block to be parsed from file""" def __init__(self): self.magic_no = -1 self.blocksize = 0 self.blockheader = None transaction_cnt = 0 transactions = None blockfile = None def parseBlockFile(self, blockfile): print 'Parsing block file: %s\n' % blockfile with open(blockfile, 'rb') as bf: self.magic_no = read_uint4(bf) print 'magic_no:\t0x%8x' % self.magic_no
The full block.py file at this stage can be found at github here.
Let’s run this file and see if we get a print out of the magic number.
coinlogic.info@proto $>python block.py blk00003.dat Parsing block file: blk00003.dat magic_no: 0xd9b4bef9
Awesome! So this matches the expected value of the block magic number. Things are looking good.
Blocksize
Next, we will read the block size in a similar fashion:
self.blocksize = read_uint4(bf) print 'size: \t%u bytes' % self.blocksize
The version of block.py at this stage is here. Running this version, we see that we have successfully parsed the size.
Parsing block file: blk00003.dat magic_no: 0xd9b4bef9 size: 30000 bytes
Block Header
Next we will parse the block header. This is a complex structure and deserves a class of it’s own. We will appropriately call this class BlockHeader. We also encounter some new data types when parsing the header, so we begin by creating some more utility functions to help us in parsing 32 bit hashes, VarInts and get pretty string representation of the 32 bit hashes.
def read_hash32(stream): return stream.read(32)[::-1] #reverse it since we are little endian def read_merkle32(stream): return stream.read(32)[::-1] #reverse it def read_time(stream): utctime = read_uint4(stream) #Todo: convert to datetime object return utctime def read_varint(stream): ret = read_uint1(stream) if ret < 0xfd: #one byte int return ret if ret == 0xfd: #unit16_t in next two bytes return read_uint2(stream) if ret == 0xfe: #uint32_t in next 4 bytes return read_uint4(stream) if ret == 0xff: #uint42_t in next 8 bytes return read_uint8(stream) return -1 def get_hexstring(bytebuffer): return ''.join(('%x'%ord(a)) for a in bytebuffer)
Now we can also create the BlockHeader class itself as well:
class BlockHeader(object): """BlockHeader represents the header of the block""" def __init__(self): super( BlockHeader, self).__init__() self.version = None self.prevhash = None self.merklehash = None self.time = None self.bits = None self.nonce = None def parse(self, stream): #TODO: error checking self.version = read_uint4(stream) self.prevhash = read_hash32(stream) self.merklehash = read_merkle32(stream) self.time = read_time(stream) self.bits = read_uint4(stream) self.nonce = read_uint4(stream) def __str__(self): return "\n\t\tVersion: %d \n\t\tPreviousHash: %s \n\t\tMerkle: %s \n\t\tTime: %s \n\t\tBits: %8x \n\t\tNonce: %8x" % (self.version, \ get_hexstring(self.prevhash), \ get_hexstring(self.merklehash), \ str(self.time), \ self.bits, \ self.nonce) def __repr__(self): return __str__(self)
As you can see, we overrode the __str__ and __repr__ functions to print the contents of the class in a pretty format.
Finally, we update the parseBlockFile() method of Block class to instantiate a BlockHeader and parse it.
self.blockheader = BlockHeader() self.blockheader.parse(bf) print 'Block header:\t%s' % self.blockheader
The version of block.py at this stage is available here. Running it, we can now see we have the Block Header successfully parsed as well!
coinlogic.info@proto $>python block.py blk00003.dat Parsing block file: blk00003.dat magic_no: 0xd9b4bef9 size: 30000 bytes Block header: Version: 1 PreviousHash: 0000000e5b44cb9b537e79227ba232b45cbd5af5112f186d3bca221 Merkle: 4955619e67e7856c365dd9e2e0f1d3f2e1227bacbf52d621b93476f1e5349 Time: 1311016847 Bits: 1a0abbcf Nonce: aa64562
Transaction Count
After the block header is a VarInt holding the count of transcations in this block. We can read it easily using our utility function as:
self.transaction_cnt = read_varint(bf) print 'Transactions: \t%d' % self.transaction_cnt
Running this version would print transaction count after the block header:
coinlogic.info@proto $>python block.py blk00003.dat Parsing block file: blk00003.dat magic_no: 0xd9b4bef9 size: 30000 bytes Block header: Version: 1 PreviousHash: 0000000e5b44cb9b537e79227ba232b45cbd5af5112f186d3bca221 Merkle: 4955619e67e7856c365dd9e2e0f1d3f2e1227bacbf52d621b93476f1e5349 Time: 1311016847 Bits: 1a0abbcf Nonce: aa64562 Transactions: 64
You can browse the version of complete file at this stage here.
Next
In the next post, we will create Transaction object and parse all the transactions in this block.
If you are enjoying the posts, you can support me my donating to my tip address: 1GdiiJNGCE8jnytNQUC2FVhMankAJUyQrn