Bitcoin: Parsing the chainstate
If you get a look at the files stored by the Bitcoin Core client, you’ll see the big blocks directory, but another one caught my attention recently: The chainstate
$ du -sh .bitcoin/*
4.0K .bitcoin/bitcoin.conf
176G .bitcoin/blocks
2.8G .bitcoin/chainstate
In this directory, there is a leveldb database with a all currently unspent transactions outputs. In Bitcoin
, the unspent transactions outputs (also called UTXOs) are one of the most important things in the protocol, as they are the only coins that can be spent. And this database is really smaller that the blockchain itself.
I’ve recently developed a tool that opens this database, and will parse all records to compute all balances for all addresses. This tool, that I called chainstate, will in less than a few minutes take each records one by one, unobfuscate it (The chainstate
database is obfuscated - simply by a xor, with a key which is inside the directory too), parse and understand the transaction format (p2pkh, p2sh, p2wpkh, p2wsh) to know what type of address we are dealing with, and output it.
The decoding is possible by reusing the bitcoin’s DecompressScript function:
switch(script_type) {
case 0x00: // P2PKH
addr = get_addr(current_prefix.pubkey_prefix, value);
cout << "DUP HASH160 " << value.size() << " " << addr << " EQUALVERIFY CHECKSIG" << endl;
case 0x01: // P2SH
addr = get_addr(current_prefix.script_prefix, value);
cout << "HASH160 " << value.size() << " " << addr << " EQUAL" << endl;
case 0x02: // P2PK
case 0x03:
addr = get_addr(current_prefix.pubkey_prefix, str_to_ripesha(old_value));
cout << "PUSHDATA(33) " << addr << " CHECKSIG" << endl;
case 0x04: // P2PK
case 0x05:
memset(pub, 0, PUBLIC_KEY_SIZE);
pubkey_decompress(script_type, value.c_str(), (unsigned char*) &pub, &publen);
addr = get_addr(current_prefix.pubkey_prefix, str_to_ripesha(string((const char*)pub, PUBLIC_KEY_SIZE)));
cout << "PUSHDATA(65) " << addr << " CHECKSIG" << endl;
case 0x1c: // P2WPKH / P2WSH
case 0x28:
addr = rebuild_bech32(value);
cout << "P2WSH "<< addr << endl;
As the same code is used by the Litecoin
and the dashcore
clients, the same software is also supporting them too!
To use, we need to copy the chainstate binary (mandatory as it will overwrite some pointers in it, and Bitcoin Core would have to reindex its stuff, which is really slow) and well, wait:
$ cp -Rp ~/.bitcoin/chainstate state
$ ./chainstate >/tmp/cs.output 2>/tmp/cs.errors
$ head /tmp/cs.output
last block: 0000000000000000004e0f5635ad8b2e58ebd0a4f02c68c604d1b5697425ce72
$ wc -l /tmp/cs.output /tmp/cs.errors
59516004 /tmp/cs.output
409643 /tmp/cs.errors
59925647 total