Don’t miss the latest developments in business and finance.

Self-verification to ensure data integrity

The hash function, used for digital signature, could help ensure data integrity. But it's only about identification, not correction

data, data integrity
Atanu BiswasBimal Roy New Delhi
Last Updated : Aug 08 2018 | 9:15 PM IST
Last October, a huge industrial scandal engulfing Kobe Steel Ltd flashed as Japan’s third-biggest steelmaker said its staff falsified data about the strength and durability of some aluminum and copper products used in planes, trains and potentially a space rocket. In fact, Kobe Steel’s admission renews concerns about the integrity of Japanese manufacturers. Data integrity is immensely important, specially in today’s data-driven world.

Selling of the same property to multiple buyers or making any kind of distortion in the database of land transactions might have serious consequence. Last October, two conmen were arrested after they sold a property worth Rs 50 million in southwest Delhi to multiple people using fake documents and Aadhaar cards. Falsification of health care data is much less focused on, but is no less serious in nature. As early as on March 20, 2015, The Washington Post named 2015 the ‘Year of the health-care hack’. According to the US Department of Health and Human Services data reviewed by the newspaper, data of more than 120 million people had been compromised in more than 1,100 separate breaches at organisations handling protected health data since 2009. And that’s a third of the US population. It is certainly impossible to tell how much of that data was falsified.

We are now living in the ocean of data; the volume of data is increasing with the ever-expanding domain of the Internet of Things. Big data is going to be replaced by bigger one. So it is important to ensure integrity of the data, as it is going to be increasingly difficult to keep track of the safety of this huge data-mine.

There are three dimensions to data integrity: Secure communication, safe storage, and the data to be audited and verified, at each point where changes were made, allowing for the bugs, modifications and other alterations to be detected. While strong cryptographic methodologies should be applied for security, the continual self-verification might detect any tampering at the earliest.

Illustration by Binay Sinha
Usually most of the stored data can be viewed in the form of multi-dimensional arrays. For example, the transaction details of a financial sector is stored as a two-way table, where the rows might correspond to the customers with the columns corresponding to the transaction time points. A new row is added for every new customer, and a new column is added for every new transaction. With this ever expanding data structure, it is normally difficult to identify any possible tampering/falsification of any part of the data due to technical errors, cyber-attacks and other security flaws. However, the use of the ‘hash’ function can easily detect any discrepancy. A ‘hash’ function, applied to a data, will result in a binary string typically of length 256 bit. This is according to the recent norm of NIST of the US government. Although the ‘hash’ function is many-to-one, it is computationally infeasible to generate another text having the same ‘hash’ by some algorithm. The excellent properties of ‘hash’ function that they are one-way and collusion-free make them attractive to use. ‘One-wayness’ is like Prince Abhimanyu of Mahabharata, who could not get out of the chakravyuh. ‘Collusion-free’ means one cannot generate two documents having same ‘hash’ value. The ‘SHA3’ version of ‘hash’ function is being used nowadays.

Thus, we may periodically ‘hash’ every row and every column in the following way. Suppose, at time point ‘t1’ there were ‘r1’ rows and ‘c1’ columns, whose integrity was ensured before, and the data was hashed both row-wise and column-wise, yielding a total of (r1+c1) hashes. At the next time point ‘t2’, the system now contains ‘r2’ rows and ‘c2’ columns, with additional (r2-r1) new customers and (c2-c1) new transaction points. Now, at this new time point, we again ‘hash’ the previous r1×c1 data matrix for each row and each column. These hashed rows and columns at two different time points can be compared for any possible mismatch. If the hashes at the i-th row and j-th column at the two different time points mismatch, we immediately know that there is some tampering/falsification for the i-th customer at his/her j-th transaction, and therefore, we can take the necessary action. Then, with the new data, we again ‘hash’ the new r2×c2 data matrix of time point ‘t2’, and store it for comparison at the next time point ‘t3’. Such application of ‘hash’ function for data storage and verification can be done very quickly by using a simple computer programme.

This kind of checking for data integrity could be undertaken in every financial, health care, educational, industrial sector — public or private. With this kind of continual self-verification of data, Kobe Steel could have easily identified the falsification of their crucial data much before, and a severe damage could have been avoided. Also, some suitably modified form of this model for verification might be useful for protection of census and other government databases all over the world. This method could even be useful as a proof of no-tampering of electronic voting machines (EVM) after the voting is done. Every EVM can be designed to calculate a ‘hash’ according to its own sequence of voting pattern. If all these hashes are stored just after the voting is over, and again verified before the counting, it might give a concrete proof of no tampering in between.

The hash function, which is widely used for the technology of digital signature, might thus provide the technology to ensure data integrity in many directions. However, it is only about identification, not correction.
The authors are professors at the Indian Statistical Institute, Kolkata

More From This Section

Disclaimer: These are personal views of the writer. They do not necessarily reflect the opinion of www.business-standard.com or the Business Standard newspaper
Next Story