Category Archives: Protocol

The Bitcoin Protocol – 2 – Block parser

Note: This is second part of a multi-part post and builds upon the work done in the first post.

In the last post, we took a block .dat file and analyzed it using hexdump. In the process, we learned the structure of a Bitcoin block. We also looked at the structure of transactions present in the block and analyzed the first transaction, which is also known as the coinbase transaction.

Today we put our learning into practice by building a basic block parser in python. The block parser would take the name of a binary file containing a block as input, build a Block object and print a pretty parsed  version of the block.

Let’s call our parser block.py. We will be working with the file blk00003.dat that we began using in the previous post. So, when invoking the parser our command would look something like this:

python block.py blk00003.dat

First, fire up your favorite code editor and save an empty python file in the directory (e.g.~/coinlogic/) where you copied blk00003.dat.

Once open, paste this as the contents of the file:

def parseBlockFile(blockfile):
   print 'Parsing block file: %s' % blockfile

if __name__ == "__main__":
   import sys
   usage = "Usage: python {0} " 
   if len(sys.argv) < 2:
      print usage.format(sys.argv[0])
   else:
      parseBlockFile(sys.argv[1])

I have also uploaded this version of file on github here. When you run this :

coinlogic.info@proto $>python block.py blk00003.dat 
Parsing block file: blk00003.dat

Great! Now that we have our file setup. Let’s start creating the block parser. We will create a class called Block to represent the block. We will create a method of this class called parseBlockFile() that takes the block file name as a parameter and parses and prints it.

Find this version of block.py on github as well

class Block(object):
   """A block to be parsed from file"""
   def __init__(self):
     self.magic_no = -1
     self.blocksize = 0
     self.blockheader = None
     transaction_cnt = 0
     transactions = None

     blockfile = None

   def parseBlockFile(self, blockfile):
      print 'Parsing block file: %s' % blockfile

def parseBlockFile(blockfile):
   block = Block()
   block.parseBlockFile(blockfile)

if __name__ == "__main__":
   import sys
   usage = "Usage: python {0} " 
   if len(sys.argv) < 2:
      print usage.format(sys.argv[0])
   else:
      parseBlockFile(sys.argv[1])

Running it, we get:

coinlogic.info@proto $>python block.py blk00003.dat 
Parsing block file: blk00003.dat
Magic number

Now, we are in a position to start parsing the block. We will open the file as binary and read it one byte at a time. The first item is a 4 byte magic number as little-endian integer. We can use python’s struct module to convert this 4 byte little-endian representation into a python integer such as this:

struct.unpack('I', f.read(4))[0]

Since, we will be reading integers of 1,2,4 and 8 bytes we can create some utility functions that will help us read these values. These functions can all accept a generic file or stream object that has a read function. Here is what these functions might look like:

def read_uint1(stream):
    return ord(stream.read(1))

def read_uint2(stream):
    return struct.unpack('H', stream.read(2))[0]

def read_uint4(stream):
    return struct.unpack('I', stream.read(4))[0]

def read_uint8(stream):
    return struct.unpack('Q', stream.read(8))[0]

Now that we have a function that can read a 4 byte integer, we are ready to read our magic number. Let’s open the file as binary by passing ‘rb’ as mode to open() and pass it to our utility function to read the integer.

The updated Block class looks like this:

class Block(object):
   """A block to be parsed from file"""
   def __init__(self):
     self.magic_no = -1
     self.blocksize = 0
     self.blockheader = None
     transaction_cnt = 0
     transactions = None

     blockfile = None

   def parseBlockFile(self, blockfile):
      print 'Parsing block file: %s\n' % blockfile
      with open(blockfile, 'rb') as bf:
         self.magic_no = read_uint4(bf)
         print 'magic_no:\t0x%8x' % self.magic_no

The full block.py file at this stage can be found at github here.

Let’s run this file and see if we get a print out of the magic number.

coinlogic.info@proto $>python block.py blk00003.dat 
Parsing block file: blk00003.dat

magic_no:	0xd9b4bef9

Awesome! So this matches the expected value of the block magic number. Things are looking good.

Blocksize

Next, we will read the block size in a similar fashion:

         self.blocksize = read_uint4(bf)
         print 'size:    \t%u bytes' % self.blocksize

The version of block.py at this stage is here. Running this version, we see that we have successfully parsed the size.

Parsing block file: blk00003.dat

magic_no:	0xd9b4bef9
size:    	30000 bytes
Block Header

Next we will parse the block header. This is a complex structure and deserves a class of it’s own. We will appropriately call this class BlockHeader. We also encounter some new data types when parsing the header, so we begin by creating some more utility functions to help us in parsing 32 bit hashes, VarInts and get pretty string representation of the 32 bit hashes.

def read_hash32(stream):
   return stream.read(32)[::-1] #reverse it since we are little endian

def read_merkle32(stream):
   return stream.read(32)[::-1] #reverse it

def read_time(stream):
   utctime = read_uint4(stream)
   #Todo: convert to datetime object
   return utctime

def read_varint(stream):
   ret = read_uint1(stream)

   if ret < 0xfd: #one byte int 
      return ret
   if ret == 0xfd: #unit16_t in next two bytes
      return read_uint2(stream)
   if ret == 0xfe: #uint32_t in next 4 bytes
      return read_uint4(stream)
   if ret == 0xff: #uint42_t in next 8 bytes
      return read_uint8(stream)
   return -1

def get_hexstring(bytebuffer):
   return ''.join(('%x'%ord(a)) for a in bytebuffer)

Now we can also create the BlockHeader class itself as well:

class BlockHeader(object):
   """BlockHeader represents the header of the block"""
   def __init__(self):
      super( BlockHeader, self).__init__()
      self.version = None
      self.prevhash = None
      self.merklehash = None
      self.time = None
      self.bits = None
      self.nonce = None

   def parse(self, stream):
      #TODO: error checking

      self.version = read_uint4(stream)
      self.prevhash = read_hash32(stream)
      self.merklehash = read_merkle32(stream)
      self.time = read_time(stream)
      self.bits = read_uint4(stream)
      self.nonce = read_uint4(stream)

   def __str__(self):
      return "\n\t\tVersion: %d \n\t\tPreviousHash: %s \n\t\tMerkle: %s \n\t\tTime: %s \n\t\tBits: %8x \n\t\tNonce: %8x" % (self.version, \
               get_hexstring(self.prevhash), \
               get_hexstring(self.merklehash), \
                str(self.time), \
                self.bits, \
                self.nonce)

   def __repr__(self):
      return __str__(self)

As you can see, we overrode the __str__ and __repr__ functions to print the contents of the class in a pretty format.

Finally, we update the parseBlockFile() method of Block class to instantiate a BlockHeader and parse it.

         self.blockheader = BlockHeader()
         self.blockheader.parse(bf)
         print 'Block header:\t%s' % self.blockheader

The version of block.py at this stage is available here. Running it, we can now see we have the Block Header successfully parsed as well!

coinlogic.info@proto $>python block.py blk00003.dat 
Parsing block file: blk00003.dat

magic_no:	0xd9b4bef9
size:    	30000 bytes
Block header:	
		Version: 1 
		PreviousHash: 0000000e5b44cb9b537e79227ba232b45cbd5af5112f186d3bca221 
		Merkle: 4955619e67e7856c365dd9e2e0f1d3f2e1227bacbf52d621b93476f1e5349 
		Time: 1311016847 
		Bits: 1a0abbcf 
		Nonce:  aa64562
Transaction Count

After the block header is a VarInt holding the count of transcations in this block. We can read it easily using our utility function as:

         self.transaction_cnt = read_varint(bf)
         print 'Transactions: \t%d' % self.transaction_cnt

Running this version would print transaction count after the block header:

coinlogic.info@proto $>python block.py blk00003.dat 
Parsing block file: blk00003.dat

magic_no:	0xd9b4bef9
size:    	30000 bytes
Block header:	
		Version: 1 
		PreviousHash: 0000000e5b44cb9b537e79227ba232b45cbd5af5112f186d3bca221 
		Merkle: 4955619e67e7856c365dd9e2e0f1d3f2e1227bacbf52d621b93476f1e5349 
		Time: 1311016847 
		Bits: 1a0abbcf 
		Nonce:  aa64562
Transactions: 	64

You can browse the version of complete file at this stage here.

Next

In the next post, we will create Transaction object and parse all the transactions in this block.

If you are enjoying the posts, you can support me my donating to my tip address: 1GdiiJNGCE8jnytNQUC2FVhMankAJUyQrn