How JavaScript uses hashing
Privacy and the security of information has become more important now than ever. For as long as information - valuable information - has existed, there have always been people plotting to obtain it in nefarious ways. Having established this context, today we will discuss hashing.
Before anything else, we answer the occasion's most important question: what is hashing?
Hashing is defined as taking a value and applying a mathematical operation (a hashing algorithm or hashing function) to it and getting a new value (known as a hash, hash value or digest message).
Yes, it is a form of cryptography.
What is this hashing useful for? Well, one of the most important areas in which it is applied is in information security. For example, when storing user passwords in a database, simply storing string values as they are is insecure and this is where hashing comes in. Password values are hashed and the resulting values are stored in the database. The stored hashes are then used to authenticate user password inputs thus making the operation more secure.
Hashing can also be used in version control since hashing algorithms will always yield the same result when provided with the same value. Thus, the hashes that result from two different copies of the same file can be used to determine whether or not changes have occured; that is, if the resulting hashes are different, then the two files are no longer the same.
Another application is in what are called hash tables. Hash tables are simply data structures (like arrays and objects) that speed up the process of finding data. Hash tables speed up data so well that their complexity is O(1)! Binary search has a complexity of O(log N) which is much less efficient. Hash tables exist as key-value pairs that hold data.
When for example, one wants to store a database consisting of individuals and their telephone numbers, their names will be hashed to integer values and become the keys while their telephone numbers become the values associated with the aforementioned keys. Since the keys are integer values, they are used as indices. This means that when searching a hash table, locating values is very easy since the indexing system means we know exactly where data is located in the hash table.
When in the realm of content security, it is not unexpected to think about encryption. Encryption and hashing are two different implementations of cryptogrpahy and one should not be confused with the other. The two terms are not interchangeable.
Hashing involves producing an output from a function that performs several iterations. Hashing functions use states which they then change with each iteration in order to use them again for the next iteration. The net result of all these iterations is that reversing hashing becomes very difficult due to the shear number of resources required to reverse it.
To illustrate this, let's look at an example, how does one determine the initial values of a and b in this expression: a + b = 100? There are several possibilities here but the problem increases in complexity if the number on the right-hand side of the equal sign is suddenly 1 000 000 000 000.
So, hashing is used when it is not necessary to know what the initial input was and all that is required is making sure that values are what they are supposed to be.
Encryption on the other hand is reversible and in addition to functions, it includes a secret key that it uses to find the initial value. Encryption is useful if initial values are needed despite having to be stored as encrypted values. For example, credit card numbers can be stored as encrypted values in a database but the encryption needs to be reversed in order to access them.
Next, we will talk about two JavaScript data structures; maps and objects. Both maps and objects store data in key-value pairs and in fact, a map object is an implementation of the Object data structure. However, there is a key (no pun-intended) difference: maps remember the insertion order of key-value pairs while objects do not. Maps are therefore optimized for iteration and are used to store data sets when iteration is important, after all, maps were created with iteration in mind.
A comprehensive list of key (again, no pun-intended) differences between maps and objects can be found here. Differences when it comes to aspects like performance, size and iteration - to mention a few - are discussed in depth.
Technically, Objects implement hashing due to the nature of their key-value pair implementation. On the other hand, hash maps which are custom hash tables implemented using the map architecture can be created. This is achieved through the creation of classes containing methods for adding data, retrieving data and a hash function to implement hashing when operations are performed.
This article elegantly explains how a hash map can be implemented in JavaScript. A video explanation is also provided.
Now, let's get back to hashing. As with most designs, hashing does pose unique challenges that need to be addressed. The biggest challenge is what is referred to as collisions. Collisions occur when two or more key values yield the same hash. This leads to us having the same hash for different values in the data set and thus a dilemma arises. We will look at two ways of getting around this problem; separate chaining and open addressing.
Separate chaining involves the use of an array of linked lists to store hashes. A linked list is an ordered set of data elements, each containing a link to its successor. If hash that already exists in the data set is produced, it is simply appended to the end of a linked list of its brethren.
Open addressing on the other hand stores every key directly into an array. The array is always kept larger than the set of keys by some constant factor so that every cell in the array is either empty or contains a key. Search sequences are then used to traverse the array when performing operations such as searching, inserting and deleting keys. The main drawback with open addressing is clustering wherein keys cluster together and thus make performing operations more expensive due to a greater number of iterations required to find keys or empty cells.
To get around this, there are three implementations of search sequences, namely: linear probing, quadratic probing and double hashing.
Linear probing involves checking each location in the array until the desired location is found; the desired location can either be an empty cell or an existing key value.
Quadratic probing uses a quadratic expression to avoid the aforementioned clusters during probing. This means that the chances of finding an empty cell for insertion or a desired key value are greater than in linear probing which does not actively avoid clusters.
Lastly, double hashing works by using two hash functions. One hash function determines the inital place to place the key and a second determines the size of the jumps in the probe sequence. The jumps help avoid clusters. Double hashing makes for shorter probe sequences than linear and quadratic probing at the expense of greater computational costs when determining probe sequences.

Comments
Post a Comment