Note: This is fourth in a multipart post explaining various aspects of Bitcoin protocol.
In the previous posts, we first looked at a Bitcoin block and studied a binary .dat file to see how various elements in a Block are laid out. After studying the block, we wrote a basic block parser in python.
But the question remains, how did these blocks get on your computer? Today we seek answers to these questions by looking at the network and messaging layer of Bitcoin network.
We will analyze network traffic generated by Bitcoin-qt and peek into messages exchanged with other nodes. Since Bitcoin is a peer to peer network, the protocol includes mechanisms to discover other nodes. Further, since all nodes may not be at same version of protocol, there are mechanisms for handling version mismatches.
Wireshark
The best way to analyze network traffic is with a network dissector tool called Wireshark. You can download Wireshark from this page. I recommend using the Development version as it includes Bitcoin parser that matches the latest protocol.
Once you download and install Wireshark, fire it up.
- If bitcoin-qt is running, quit it.
- In the filter toolbar of wireshark, type bitcoin and click apply (check out Wireshark documentation to learn more).
- Once the filter has been applied, begin capturing live network traffic.
- Now start bitcoin-qt
As bitcoin-qt connects to Bitcoin network, you will see packets appearing in the display window.
Messages
Bitcoin nodes connect to other nodes via TCP. The nodes typically listen for messages on port 8333 (although the protocol allows configuring nodes to listen on any port). Each message passed on the Bitcoin network has a well defined structure.
Message structure
Field Size | Description | Data type | Comments |
---|---|---|---|
4 | magic | uint32_t | Magic value indicating message origin network, and used to seek to next message when stream state is unknown |
12 | command | char[12] | ASCII string identifying the packet content, NULL padded (non-NULL padding results in packet rejected) |
4 | length | uint32_t | Length of payload in number of bytes |
4 | checksum | uint32_t | First 4 bytes of sha256(sha256(payload)) |
? | payload | uchar[] | The actual data |
The 4 byte magic number for the main bitcoin network is 0xD9B4BEF9 and when sent over the wire, it is converted into little endian as F9 BE B4 D9.
Following the magic number is a 12 byte field describing the command being sent in the message. This is usually an ascii text that is zero padded. For example, a version message would have the 12 byte command ‘v’ ‘e’ ‘r’ ‘s’ ‘i’ ‘o’ ‘n’ 0x0 0x0 0x0 0x0 0x0
Next we have 4 byte little-endian integer that tells us how long the payload in this packet is in bytes. In the example shown below its 0x00000064 = 100 bytes
The next 4 bytes are a checksum of the payload. This is created by creating sha256 hash of sha256 hash of payload and then discarding everything after the first 4 bytes of this 32byte hash. So, in the example packet above, the bytes 0x35 0xd1 0x4f 0xef (after the hightlighted length) are the checksum bytes.
Note: In case the packet has no payload, then payload length is 0x0 and checksum bits are 0x5d 0xf6 0xe0 0xe2. The checksum in this case is generated as
- sha256(sha256(<empty string>))
- = sha256(0x e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
- = 0x5df6e0e2761359d30a8275058e299fcc0381534545f55cf43e41983f5d4c9456
The remaining bytes in the packet are length bytes of payload. How each payload is decoded, depends upon the protocol version and the command.
Version Message
When a Bitcoin node wakes up, the first thing it does after it configures listener ports etc. is to send an outgoing version command to a known node on the Bitcoin network. This is the first step in establishing a connection to remote node. No two nodes can communicate without first exchanging a version message.
Note: How the first bitcoin node to connect to is found depends upon the implementation of node. For bitcoin-qt there are well defined rules as described here.
The version message is a way of advertising the presence and capabilities of the node to other nodes on the network. The command used in the top level message structure is zero-padded string “version”.
This packet has a variable length payload and the structure of version message payload is of the form:
Field Size | Description | Data type | Comments |
---|---|---|---|
4 | version | int32_t | Identifies protocol version being used by the node |
8 | services | uint64_t | bitfield of features to be enabled for this connection |
8 | timestamp | int64_t | standard UNIX timestamp in seconds |
26 | addr_recv | net_addr | The network address of the node receiving this message |
version >= 106 | |||
26 | addr_from | net_addr | The network address of the node emitting this message |
8 | nonce | uint64_t | Node random nonce, randomly generated every time a version packet is sent. This nonce is used to detect connections to self. |
? | user_agent | var_str | User Agent (0x00 if string is 0 bytes long) |
4 | start_height | int32_t | The last block received by the emitting node |
1 | relay | bool | Whether the remote peer should announce relayed transactions or not, see BIP 0037, since version >= 70001 |
The payload for an example version message sent over the wire looks like this:
The payload starts with a 4 byte int32 Protocol Version. In the packet above, this would be 0x00011171 = 70001.
Following this is an 8byte bitfield specifying features to be enabled for this connection. The only available service at this point is NODE_NETWORK
Value | Name | Description |
---|---|---|
1 | NODE_NETWORK | This node can be asked for full blocks instead of just headers. |
Consequently, the value of these 8 byte field is 0x0000000000000001
After node services is another 8 byte field. This is a 64bit integer timestamp in terms of seconds since epoch. For the example packet above, this would be Feb 22, 2014 15:36:27.000000000 PST:
Next we have 26 byte address of the receiving node of this message. So this would be the remote node that you are instantiating connection to. The network addresses in Bitcoin messages follow a structure:
Field Size | Description | Data type | Comments |
---|---|---|---|
4 | time | uint32 | the Time (version >= 31402). Except in Version message. |
8 | services | uint64_t | same service(s) listed in version |
16 | IPv6/4 | char[16] | IPv6 address. Network byte order. The original client only supports IPv4 and only reads the last 4 bytes to get the IPv4 address. However, the IPv4 address is written into the message as a 16 byte IPv4-mapped IPv6 address(12 bytes 00 00 00 00 00 00 00 00 00 00 FF FF, followed by the 4 bytes of the IPv4 address). |
2 | port | uint16_t | port number, network byte order |
Note that for version message, the timestamp is not prefixed.
So for our payload, the address would have 8 byte services, 16 byte address and 2 byte port.
Another key point for network addresses is that Address and Port fields are not little-endian, but follow the network byte order, whereas services is little-endian. For the example payload shown above, the receiving node portion looks like this:
We have services = 0x0000000000000001. The address is 12 bytes of 00 00 00 00 00 00 00 00 00 00 FF FF followed by 4 bytes of IPv4 address i.e. 0x5b.0x79.0x0e.0x2d or in decimal: 91.121.14.45. The last 2 bytes – 0x20 0x8d form a 4 byte big-endian integer 0x208d = 8333. So, in summary we are connecting to 91.121.14.45:8333.
Next we have our own address a.k.a emitting node address.
Again, our node services is 0x0000000000000001. The address is again an IPv4 address, so the first 12 bytes of the 16 byte field are 00000000000000000000ffff, followed by the address as seen by the outside world i.e. 0x0a.0x01.0x4c.0xe1 i.e. 10.1.76.225. Our listening port is 8333 as well, so the last 2 bytes are 0x20 0x8d, same as it was for the receiving node.
Next there is a 64bit random number which can be used by node’s implementation to detect duplicate payloads.
After the random nonce is a variable length string containing the name of the Node implementation a.k.a. User agent. A VarStr is defined as VarInt describing the length of the string followed by string itself. (We discussed VarInts in our first post).
Field Size | Description | Data type | Comments |
---|---|---|---|
? | length | var_int | Length of the string in bytes |
? | string | char[] | The string itself (can be empty) |
And VarInt:
Value | Storage length | Format |
---|---|---|
< 0xfd | 1 | uint8_t |
<= 0xffff | 3 | 0xfd followed by the length as uint16_t |
<= 0xffffffff | 5 | 0xfe followed by the length as uint32_t |
– | 9 | 0xff followed by the length as uint64_t |
So in our example payload:
The length is 0x0f = 15 bytes. The actual string following the length is:
Hex: 2f 53 61 74 6f 73 68 69 3a 30 2e 38 2e 36 2f Ascii: / S a t o s h i : 0 . 8 . 6 /
Note: VarStr doesn’t need to be null terminated
The last element in the Version payload is Block start height. This is the height of the latest block in our database. As Bitcoin node adds more blocks to it’s database, this height grows as well. When the node finds another node with higher Block start address, it can request newer blocks from that node to update its database.
So, in our example payload, the height is 0x0003aa65 = 240229
Conclusion
This completes our look at the version message. We will look at some of the other network messages in the next few posts and then combine them together to see how each node communicating with other nodes with a few well defined messages builds such a resilient network.
If you are enjoying the posts, you can support me my donating to my tip address: 1GdiiJNGCE8jnytNQUC2FVhMankAJUyQrn