Hash Function-The Heart of Blockchain
By Sumi Maria Abraham, R&D Engineer Kerala Blockchain Academy
Blockchain technology is undoubtedly one of the most defining technological innovations of our times. It has refined the mode of verification and storing of digital transactions through Distributed Ledger Technologies (DLT). However, to decipher the working of blockchain, one needs to conquer the basic idea of Hashing. If you are new to blockchain and wish to understand how blockchain works, then grasping this concept will come in handy. I’ll try to simplify hashing in this article, though it’s a bit technical.
So let’s get started.
Blockchain is a distributed ledger that stores many data related to transactions. Here, data are to be retrieved occasionally for verification purposes, and Hash Functions are essential.
What is Hashing?
Hashing is the process of converting input data of any length to a unique fixed-length value (hash value). The obtained value will act as a digital fingerprint of the input data.
The String of 256-Bit
There exist various types of hash functions: MD5, SHA, LANMAN, etc. Bitcoin uses SHA-256, whereas Ethereum uses Keccak-256. However, we’ll limit our discussion to SHA-256, the commonly used hashing algorithm in blockchains.
SHA stands for the Secure Hash Algorithm, and SHA-256 converts the input data to a 256-bit value.
Characteristics of Hash Functions
Hash functions have many interesting properties.
They take an input of any length and produce an output of a fixed length. So even if the input size is large, the hash value takes only a fixed amount of space.
Deterministic: Hash functions will always give the same output for a particular input.
Diffusion / Avalanche effect: Even if the input is changed slightly, there will be a notable and unpredictable change in the hash value.
If one needs to check data is altered or not, comparing the hash values is more effortless than comparing the original values.
As you can see in the example, a malicious user changed the transaction data and revised the amount from 1000 to 10000 INR. Though the change is small, the hash values significantly change, and alteration is easier to detect.
Irreversible: Hash functions are one-way, computationally impractical to reverse.
There is no method that can decode a hash value and retrieve the original data.
Collision Resistance: Two different inputs will not give the same hash value.
Let us see how the input data gets hashed in SHA 256.
The input data is converted firstly to the binary format. Then the input will be padded with some extra data so that the total length is within an allowed limit. It gets further divided into blocks of 64 bytes, and it also includes an external input as the initial value. As shown in the figure below, each block gets processed by a series of functions.
The processing function consists of several stages. We will not go to the details of implementation. The final hash value obtained will be of size 256 bits. The hash value will be in binary format ( a string of 0’s and 1’s), which gets converted to hexadecimal representation (string of 0–9, A-F) for ease of processing.
Blockchain and Hashing
In a blockchain, hash functions are for various purposes.
For instance, the size of a transaction in blockchain may vary, and Hashing converts it to a uniform fixed-length format. So instead of storing the original transaction, its hash representation can be stored, which saves space and provides data privacy.
Secondly, within a block, the list of transactions is represented using a Merkle tree data structure. Merkle tree is constructed based on transaction hashes, and the Merkle root (root hash) represents the whole list of transactions in that block. Whenever any transaction gets altered, the root hash value will also change, which helps in verifying the data integrity. Read about the Merkle tree here.
Blockchain networks like Bitcoin use hashing for producing a block. Multiple nodes (miners) in the network will be competing to create a block. A consensus mechanism is used to decide which block should be considered final. Bitcoin uses a proof of work (PoW) consensus algorithm, selecting the block based on a hash value. Check this link for a detailed explanation of PoW.
Hashing is also used for generating an account address. A user operates an account using a key pair which consists of a secret private key and an open public key. The private key is usually selected randomly. The public key is generated from the private key using a one-way mathematical function (cryptographic function). The account address is generated from a public key using a hash function.
Besides all this, hashing plays a vital role in the basic design of the blockchain itself. The blocks in a blockchain are “chained” via hash values, and each block stores the hash value of the previous block’s header. This connection of two blocks forms a chain of blocks.
If any modifications get applied to a block, its hash value will also change. Modifications will invalidate the previous hash values stored on all the blocks following that block, ultimately breaking the chain.
Everybody in the blockchain network holds a copy of the blockchain. The security risk is high as anyone can alter the data. But hash functions help preserve data integrity and contribute to the immutability property of blockchain.
What you say, isn’t Hashing critical to blockchain?