I have explored the definitional attributes of blockchain in part1 of this series. In this part 2, I want to delve into the database properties of the blockchain.
Part2: Blockchain as a database
Storing a variety of objects in an organized manner is an art as well as a science. For example, when books are organized in a library, the classification of books is carried out using a hierarchical classification system known as the Decimal Classification system, which was first introduced by Melvil Dewey in 1873.
When data about a class of objects is organized in a database, the science part of this organized data is related to data semantics, retrieval of data in a consistent manner subject to any constraints that may exist among the objects as well as amongst attributes of an object. The data model encapsulates all these issues of data organization from the perspective of users of data. Historically, data models have evolved along with exponential growth in computing power and storage capability of computer hardware.
Currently, the relational data model is the pre-dominant data model used by enterprises. E.F. Codd, while working as a computer scientist for IBM, first proposed the architecture of a relational data model in his 1970 paper titled “A Relational Model of Data for Large Shared Data Banks”. In his 1981 Turing Award lecture, he pointed out 3 main objectives that a relational database system tries to achieve. These are:
- data independence objective- database draws a sharp boundary between logical (i.e. business view of data) and physical view of data (i.e. machine storage or technical view of data).
- communicability objective- a simple intuitive way of organizing data so that business users can have a common understanding of data
- set-processing objective- application of the principles of set theory in the processing of two or more different datasets- a union of two sets, sub-setting of a set, a complement of a set, etc.
Let us examine the extent to which a blockchain meets the above objectives. It would be in order to enter a caveat here. Blockchain was not designed to work as a distributed database but only as a distributed ledger. A ledger does not qualify as an enterprise-level database as we have argued in the part 1. So lacking any feature of a standard distributed database does not negate the usefulness of blockchain in many areas. However, my firm view is that cryptocurrencies as currently being offered by many blockchain platforms are destined to fail for their designed anonymity in monetary transactions.
The first objective of data independence is clearly lacking in every blockchain data management framework. The design of the data structure used by the Bitcoin platform or Ethereum platform is aimed at ensuring the integrity of all transactions and making the verification of the same through consensus algorithm by miner nodes as quickly as possible. For these reasons, search trees like Merkle Tree are used by Bitcoin and Merkle Patricia Tries by Ethereum (see Kamil Jezek(2020). As a result, from a business user perspective, the data structure is too complex and opaque for decision-making purposes. It may be said that transactional databases of cryptos were not designed to meet such requirements.
As regards the third objective, the current implantation of blockchain technology for permission-less access to transactional data and the associated implementation of a consensus algorithm does not even aim at the segmentation of data by attributes of those undertaking transactions as well as transactions it selves. So this objective is absent by definition for cryptocurrency-oriented blockchains.
But business use cases for blockchain need not be constricted to the worlds of cryptos. If we can take the definition of blockchain as a growing list of records, then it should be possible to marry blockchain with a proper relational database for deriving benefits of both the technology, immutability property of blockchain, and providing access to enriched transactional data for data analysis. A number of such applications have been created to query a blockchain data file. Some of these applications are listed below.
1. Bitquery is an OLAP system built to provide business intelligence with regard to data stored in a blockchain. Data in this system is sourced from a blockchain using Graph Query Language and stored in multidimensional OLAP cube. https://bitquery.io/
2. Bitiodine is a tool, proposed by Michele Spagnuolo et. al for analysing and profiling the Bitcoin network. The authors have suggested a methodology to “automatically parse the blockchain, cluster addresses, classify addresses and users, graph, export and visualize elaborated information from the Bitcoin network.”. The authors claim that their methodology can identify illegal or criminal use of cryptocurrency as in the case of “Silk Road” incident.
3. Chainalysis is another query tool developed on blockchain data for investigating cryptocurrency transactions. https://www.chainalysis.com/
4. Nansen is another commercial software application to analyze on blockchain data. The software has built a repository of more than 70 million crypto wallets. Like the applications described above, the ability to monitor flow of funds from one address to another is a key feature of this application. https://www.nansen.ai/about
5. Abe is another software that reads “the Bitcoin block file, transforms and loads the data into a database, and presents a web interface similar to Bitcoin Block Explorer. Abe runs on PostgreSQL, MySQL’s InnoDB engine, and SQLite. Other SQL databases may work with minor changes.”
6. Kondor and his associates of Eötvös Loránd University of Hungary have analysed Bitcoin data and created a Bitcoin Transaction Network that provides bitcoin transaction data as extracted with the bitcoind client. Data is provided in a tab-separated TSV file.
It is important to note here that analysis of blockchain data involves analysis of graph data. Graph analytics has been extensively used in social network analysis. Such analysis can provide insight into the flow of money/ values from one node in a blockchain network to another node and identify addresses that relate to a particular wallet with a certain probability. The “Graph protocol” nicknamed “Google of the blockchains” has been created for indexing and querying data from blockchains, starting with Ethereum. Initially, it provided a hosted service for free but it has been now announced that the company will cease to provide the hosted services in 2023.
Finally, it is quite clear that blockchain should be considered as a repository of transactions but not as a database proper. In this age of the internet when 2.5 quintillions (2X10^18) bytes of data is produced every day, it would be next to impossible to adhere to Codd’s objectives to store even petabytes (1 million GB) of data in a proper database. For example, Hadoop which has been designed to handle Big Data is a framework that allows files of structured as well as unstructured data stored in multiple computers to be accessed, retrieved, and analyzed. So blockchain has its own uses but for an enterprise, all business data cannot be or rather should not be stored in a blockchain. For example, Walmart Canada has successfully built a private blockchain to solve supply-chain challenges on Hyperledger Fabric, but resting it on top of a legacy system.
Dinh Tien Tuan Anh et.al (2017), Untangling Blockchain: A Data Processing View of Blockchain Systems, DOI 10.1109/TKDE.2017.2781227, IEEE
Jules Azad Emery, Matthieu Latapy(2021). Full Bitcoin Blockchain Data Made Easy. IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM 2021), Nov 2021, The Hague (virtual), Netherlands. hal-03443053
Kamil Jezek(2020). Ethereum Data Structures. (August 2020), https://doi.org/10.1145/1122445.1122456
Kate Vitasek, John Bayliss, Loudon Owen, and Neeraj Srivastava, How Walmart 92022) , Canada Uses Blockchain to Solve Supply-Chain Challenges in Harvard Business Review, January 2022
Kondor D, Po´ sfai M, Csabai I, Vattay G (2014) Do the Rich Get Richer? An Empirical Analysis of the Bitcoin Transaction Network. PLoS ONE 9(2): e86197. doi:10.1371/journal.pone.0086197
McGinn D, D. McIlwraith and Y. Guo, Toward Open Data Blockchain Analytics: A Bitcoin Perspective In Royal Society Open Science,
Spagnuolo, M., Maggi, F., Zanero, S. (2014), BitIodine: Extracting Intelligence from the Bitcoin Network” In: Christin, N., Safavi-Naini, R. (eds) Financial Cryptography and Data Security. FC 2014. (Lecture Notes in Computer Science), vol 8437.
Xu Cheng , Ce Zhang, Jianliang Xu(2019), vChain: Enabling Verifiable Boolean Range Queries over Blockchain Databases in 2019 International Conference on Management of Data (SIGMOD ’19), June 30–July 5, 2019,,
Yue Kwok-Bun, Karthika Chandrasekar, and Hema Gullapalli (2019), Storing and Querying Bitcoin Blockchain Using SQL Databases in Information Systems Education Journal Vol 17(4)