News
Hashing algorithms for cryptocurrency blockchain, types and operation principle
Hash and hashing algorithms are key concepts that blockchain novices are familiar with and that always go hand in hand with security.
To maintain decentralized networks and consensus mechanisms, including Bitcoin or Ethereum with a thousand nodes connected via p2p, a “lack of trust” and an effective confirmation system are necessary. These networks need compact ways to encrypt information, which would allow participants to conduct fast and secure checks.
The block is one of the main components of Bitcoin and Ethereum . The block contains information about transactions, timestamps and other essential metadata. The ability to compress large amounts of information about the global state of the network into short standard messages, which can be easily checked if necessary, plays a huge role in ensuring security. These messages are a hash.
If you change at least one character of the input value, you get a completely different hash!
Cryptographic hashes are used everywhere, from storing passwords to checking systems. Their essence consists in using a deterministic algorithm that takes input data and creates a string with a fixed length . Thus, the same input data will always be converted to the same output data.
Determinism is not only important for hashes, because even if you change one bit of input data, you get a completely different hash .
The problem of hashing algorithms is the inevitability of collisions . Since the string length of such a hash is fixed, there may be other input data that forms the same hash. Collisions are bad . This means that an attacker can create collisions on demand, can transfer malicious files or data with the correct hash and give this data as correct. A good hash function should make it very difficult for attackers to find ways to create input data with the same hash value.
The process of calculating the hash should not be too effective, as in this case, attackers can easily calculate the collision. The hashing algorithms must be resistant to the attacks of the preimage. It is necessary to maximally complicate the process of calculating steps to find the original value from which the hash was created (for example, a prototype).
It is necessary that the method of calculating x for s = hash (x) is practically impossible.
So, “decent” hashing algorithms should have three properties :
- If you change one bit of incoming data, an avalanche effect should form and a completely different hash will turn out;
- Small chance of collisions;
- Efficiency that does not sacrifice collision resistance.
Hacking Heshes
One of the first standards of hashing algorithms is MD5 hash. This algorithm was popular for checking the integrity of files (checksums) and storing hashed passwords in web application databases.
Its functionality is quite simple – it produces a string with a fixed length of 128 bits for each input value and uses standard one-way operations in several cycles to calculate a deterministic output value. Due to the short length of the output value and the simplicity of the operations, the MD5 hash is very easily cracked, in particular, the “birthday” attack.
Attack "birthdays"
There is a well-known statement that if there are 23 people in a room, then the chance that two of them were born on the same day is 50%. If there are 70 people in the room, this probability increases to 99.9%. This happens according to the Dirichlet principle , which states that if you place 100 pigeons in 99 boxes, then two pigeons will be in one of the boxes. The restriction of the fixed length of the output value means that there is a fixed level of combinations for collisions .
The MD5 hash is so vulnerable to collisions that even a simple 2.4 GHz Pentium processor can calculate hash collisions in just a few seconds. Moreover, the widespread use of this hash at the dawn of the network has created millions of MD5 prototypes that could be found using Google’s usual search for their hash.
Diversity and evolution of hashing algorithms
Start: SHA1 and SHA2
The NSA (yes, the NSA) has been a pioneer in the creation of standards for hashing algorithms for a long time. It was the NSA that proposed the Secure Hashing Algorithm or SHA1 algorithm with a fixed output value of 160 bits.
Unfortunately, SHA1 slightly surpassed MD5 due to the increased value, the number of one-way operations and their complexity, but this hash did not offer solid improvements in protection against more powerful machines that create different attack vectors.
How to improve the hashing algorithm?
SHA3
In 2006, the US National Institute of Standards and Technology (NIST – National Institute of Standards and Technology) announced a competition to create an alternative to the SHA2 algorithm, which was to differ fundamentally in its design and become a new standard. From the scheme of hashing algorithms called KECCAK (pronounced “kechak”), the SHA3 algorithm appeared.
Although SHA3 has a name similar to the previous algorithms, it differs greatly in its internal structure by the mechanism of a cryptographic sponge . This mechanism performs random permutations to absorb and create data, serving as a source of randomness for incoming data that will fall into the hashing algorithm in the future.
SHA3 retains the original state with a large number of bits of information than in the outgoing value, and thus exceeds the limitations of the previous algorithms. This algorithm became the NIST standard in 2015.
Hashing and proof of work done
As part of the implementation of the hashing algorithm in the blockchain protocol, Bitcoin used SHA256 for the algorithm for proving the work performed, and Ethereum, which appeared later, used a modified version of SHA3 (KECCAK256). When choosing a hash function for a blockchain with proof of work done, the efficiency of hash calculation is very important.
Bitcoin SHA256 can be extremely efficiently computed using special hardware — specialized integrated circuits (ASIC – Application Specific Integrated Circuits). Much has been written about the use of ASIC in mining pools and how they lead to the centralization of computing. Proof of the work done provokes the creation of pools from groups of computationally efficient machines, thereby increasing the so-called "hash capacity" or the number of hashes that a machine can calculate over a certain period of time.
[ ed .: Mining in the pool against solo mining does not increase the hash capacity of the participant. The rate at which hashes are found at a fixed hash capacity of the participant’s equipment does not change when compared with solo mining. However, the hash-capacity of all participants in the pool collected from different participants makes it possible to find blocks more often than in each of the individual participants in the pool. Accordingly, the pool finds blocks more often and pays a block reward to all pool members per unit of time in proportion to the hash capacity participating in the pool – minus the pool commission. For example, a participant with 1Ghash / s when mining a Bitcoin solo could have expected a “block finding” event – and receiving a reward of 12.5 BTC for it – sometime within five years (for example!). Participation in the pool with the same capacity 1Ghash / s (assuming no change in complexity over the interval of five years!) Allows the same miner to receive 12.5 BTC parts, evenly over the same five years. This solves the problem of the miner with Fiat payments in real life to support the farm ].
Ethereum has a modified SHA3 algorithm – KECCAK256. The algorithm for proving the work done by Ethereum called Dagger-Hashimoto required too much memory for the hardware to calculate.
Bitcoin and double SHA256
Bitcoin hashes data using SHA256 in an unusual way: it runs two types of algorithm in the protocol. You need to understand that this is NOT a measure against “birthday” attacks, because if hash (x) = hash (y), then hash (hash (x)) = hash (hash (y)). The SHA256 dual algorithm is used to combat the attack by lengthening a message.
If attackers know the length of the input hash data, they can force the hash function to take a certain part of the internal state by adding a secret string to the hash value. Because of this lack of SHA256 algorithm from the SHA2 family of algorithms, Bitcoin computes the hash twice.
Ethereum 2.0 and BLAKE
SHA3 is not the only invention that emerged from the 2006 NIST competition. The second place was taken by the BLAKE algorithm. For Ethereum 2.0 segmentation, more efficient hashing is practically a requirement that is seriously explored by development teams. The BLAKE2b hashing algorithm, a highly advanced version of BLAKE, is subjected to rigorous analysis because of its fantastic efficiency compared to KECCAK256 and its high level of security.
On a modern processor, computing using BLAKE2b is three times faster than using KECCAK.
Perspective: the future of hashing algorithms
It seems that the future of hashing algorithms is reduced either to (1) an increase in the complexity of internal hashing operations, or (2) an increase in the length of the output hash value in the hope that the attackers' computers will not have time to calculate the collisions.
To ensure network security, developers rely on the ambiguity of the prototypes of one-way operations. Thus, in order to secure a hashing algorithm, it is necessary to complicate as much as possible the process of finding two hash values with the same output, despite the presence of an infinite number of possible collisions.
What about future quantum computing? Will the hash algorithms stand against them?
In short, based on the current situation, you can answer that yes, the hashing algorithms will be tested by time and resist against quantum computations. Quantum computing can create problems with respect to rigorous basic mathematical structures using clever tricks and such theories as RSA encryption. However, hashing algorithms have a more formal internal structure.
Quantum computers do speed up the process of calculating unstructured tasks like hashing, but they will still use brute-force attacks, like ordinary computers do today.
Regardless of which algorithms are used in the protocols, one thing is clear – we are waiting for an efficient in terms of computing the future and we have to reasonably choose the right tools for this business, and then the algorithms will stand the test of time.