This is the updated dataset, containing all Bitcoin transactions in the first 508241 blocks (approximately up to 9 Feb 2018). All files are in TSV format, compressed with gzip or xz. Line endings are LF, the files have no header. Note: transaction and address IDs are different than the ones used in our previously published datasets! Please do not mix the old and new data with each other. This dataset contains the following files: bitcoin_2018_bh.dat.gz (~20 MiB) -- output file for block hashes (mapping from blockIDs used in all other outputs); format: blockID, hash, block_timestamp, n_txs bitcoin_2018_txh.dat.gz (~12 GiB) -- output file for transaction hashes (mapping from txIDs used in all other outputs); format: txID, hash bitcoin_2018_addresses.dat.gz (~9.9 GiB) -- output file for address ID mapping to address strings (IDs are used in all other outputs); format: addrID, address string bitcoin_2018_tx.dat.xz (~248 MiB) -- output file for transaction overview (mapping to block and number of inputs / outputs); format: txID, blockID, n_inputs, n_outputs bitcoin_2018_txin.dat.xz (~7.1 GiB) -- output file for transaction inputs; format: txID, input_seq, prev_txID, prev_output_seq, addrID, sum bitcoin_2018_txout.dat.xz (~4.8 GiB) -- output file for transaction outputs; format: txID, output_seq, addrID, sum bitcoin_2018_multiple.dat.gz (~4 MiB) -- output file for transaction outputs with multiple addresses (multisign); the txout and txin file will only include the first address, this file includes all; format: txID, output_seq, addrID bitcoin_2018_nonstandard.dat.gz (~12 MiB) -- output file for nonstandard transaction outputs; format: txID, output_seq bitcoin_2018_addr_sccs.dat.gz (~1.6 GiB) -- separately generated address contraction dataset, see below for more info; format: addrID, userID All output files use numeric IDs to refer to transactions, blocks and addresses (these are all counters starting from 0). These are mapped to hashes in the three files txh.dat, bh.dat and addresses.dat respectively. A special value of -1 for txID means a bug in the processing (should not happen). A special value of -1 for an addrID means that the address could not be decoded. This is not necessarily an error, there are certain nonstandard transactions where this can happen. The flow of bitcoins can still be followed in these cases as all transaction inputs are linked to the corresponding previous transactions outputs in the txin.dat. All sums are in Satoshis (1e-8 BTC). Transaction inputs and outputs include a sequence number (input_seq and output_seq respectively), which identifies the input / output. These are counters starting from 0 for each transaction. I'm not sure if these will correspond to the same used by other Bitcoin clients, but can be used to map inputs to previous outputs. The txin.dat file includes this information: the prev_txID and prev_output_seq columns refer to the previous transaction output that is being spent. Mining rewards (coinbase transaction) can be identified by having zero inputs. For all other transactions, the sum of inputs should be greater than the sum of outputs, but this is not checked explicitely during processing. Note: some files are compressed with xz, giving a higher compression ratio. For other files, the extra processing time did not seem to be worth it, so they are compressed with standard gzip. This is the case for the files containing Bitcoin addresses and transaction hashes, which are basically random data, so the only "compression" comes from storing them in a binary format, and not in human readable hex / base32 / base58. The modified bitcoind client to generate this dataset can be downloaded here: https://github.com/dkondor/bitcoin/tree/0.16. Further code which can be used to convert this dataset to a weighted directed graph (list of edges) is available here: https://github.com/dkondor/txedges. The "address contraction" dataset describes a possible grouping of Bitcoin addresses to entities / users that control them using a simple heuristic of assuming that all input addresses of a transaction are controlled by the same entity. See https://github.com/dkondor/sccs32s for the steps how this was created from the transaction inputs.