lect17

Relevant topics in the textbook:

Data Structures and Other Objects Using C++ Chapter 12.2 & 12.3

Hashing

The ability to address unique key values in an array whose size may be smaller than the set of possible key values.
Generally, keys are unique values that identify some record of information.
- In order to put data (value) in the appropriate place in the collection, a hash function is required.
  - The hash function outputs a position in the collection based on the key value
  - Hash function outputs should be uniformly distributed
    - Or else all data would try to be stored in the same index.
  - For example, if you have array of 100 elements, a simple hash function could be:
    - key % 100 = [0-99]

Hash Table

Examples of hashing applications:
- As a data structure (search)
- Cryptography:
  - password verification
  - file verification
Hashing is very efficient for searching for data in an array.
- Recall binary search: O(log n) search time (if elements are sorted).
- Linear search: O(n) search time.
- Hash Table search: O(1) average search time in unsorted order.
  - Hash Table searching provides instant access to an element in an array since the hash function computes the index where the data is stored.

Collisions

It’s possible that two elements may be indexed to the same location.
- This is known as collisions

Open-address Hashing

Collisions are resolved by placing the item in the next open spot in the array.
For example, if a record is hashed to position i and data already exists in that index, then check the next available spot i+1, i+2, etc.
- Known as linear probing
- What is a problem with this mechanism?
  - Delayed Insertion – inserting an item in a crowded hash table takes time since it must look for an empty spot.
    - Elements are inserted farther away from their actual hashed index.
  - Clustering – When different keys are hashed to the same index, there may be groups of data records grouped in the same place.

Open Address

Example

//Makefile
CXX=g++

main: main.o hash.o
	${CXX} -o main -std=c++11 main.o hash.o

clean:
	rm -f *.o main

//hash.h
#include<list>
class Hash 
{ 
	private:
		int BUCKET;    // No. of buckets 
		int curr_size;
		std::list<int> *table; // Pointer to an array containing buckets 
	public: 
		Hash(int V);  // Constructor
		void insertItem(int x); // inserts a key into hash table 
		void searchItem(int key);
		void deleteItem(int key); // deletes a key from hash table 
		int hashFunction(int x); // hash function to map values to key
	 	void displayHash(); 
}; 

//hash.cpp
#include<iostream>
#include "hash.h"

using namespace std;

Hash::Hash(int b) 
{ 
	this->BUCKET = b; 
	table = new list<int>[BUCKET]; 
} 
  
void Hash::insertItem(int key) 
{ 
	if(curr_size == this->BUCKET){
		cout << "Cannot insert " << key << " -- table is full" << endl;
		return;
	}

	int index = hashFunction(key); 

	cout << "Inserting:" << key << " at index: " << index << endl;
	if(table[index].empty()){
		table[index].push_back(key);  
	}
	else{
		int new_index = index;
		while(true){
			cout << new_index << " was occupied -- trying next index " << endl;
			new_index = (new_index+1)%(this->BUCKET);
			if(table[new_index].empty()){
				table[new_index].push_back(key); 
				break;			
			}
		}
	}
	curr_size++;
} 
  
void Hash::deleteItem(int key) 
{ 
	// get the hash index of key 
	int index = hashFunction(key); 

	if(*(table[index].begin()) == key){
		table[index].erase(table[index].begin()); 
	}
	else{
		int new_index = index;
		while(true){
			new_index = (new_index+1)%(this->BUCKET);
			
			if(*(table[new_index].begin()) == key){
				table[new_index].erase(table[new_index].begin());
				curr_size--;
				break;			
			}
			if(new_index == index){
				cout << "Item not found" << endl;
				break;
			}
		}
	}

} 

void Hash::searchItem(int key) 
{ 
	// get the hash index of key 
	int index = hashFunction(key);
	if(*(table[index].begin()) == key){
		cout << "Item: " << key << " found at: " << index <<  endl;
	}
	else{
		int new_index = index;
		while(true){
			new_index = (new_index+1)%(this->BUCKET);
			
			if(*(table[new_index].begin()) == key){
				cout << "Item: " << key << " found at: " << new_index <<  endl; 
				break;			
			}
			if(new_index == index){
				cout << "Item not found" << endl;
				break;
			}
		}
	}
} 
  
// function to display hash table 
void Hash::displayHash() { 
	cout << endl << "Printing:" << endl;
	for (int i = 0; i < BUCKET; i++) { 
		cout << i; 
		for (auto x : table[i]) 
			cout << " --> " << x; 
		cout << endl; 
	} 
	cout << endl;
} 

int Hash::hashFunction(int x) { 
	return (x % BUCKET); 
} 

//main.cpp
#include "hash.h"

   
int main() 
{ 
	// array that contains keys to be mapped 
	int a[] = {15, 11, 27, 8, 12}; 
	//int a[] = {15, 11, 27, 8, 12, 18, 22}; 
	//int a[] = {15, 11, 27, 8, 12, 18, 22, 29}; 
	int n = sizeof(a)/sizeof(a[0]); // length of the array

	
	Hash h(7);   // 7 is count of buckets in hash table 

	// insert the keys into the hash table:
	h.displayHash();
	for (int i = 0; i < n; i++){ 
		h.insertItem(a[i]);   
		h.displayHash();
  	}

	h.searchItem(12); 
	h.deleteItem(12);
	h.displayHash();

	h.searchItem(8); 
	h.deleteItem(8); 
	h.displayHash();  

	return 0; 
} 

Double Hashing

Technique that uses a 2nd hash function when resolving a collision.
- If a hash function index results in a collision, then use the 2nd hash function to determine how far to step in the array to look for an empty slot.
- Helps reduce the clustering effect.
Problems
- If hash2 function is large, there is a possibility that we will go out of bounds.
- Depending on the table size and hash2, it is possible that the index won’t be uniformly distributed.

Double Hashing

Example

// Add to hash.h
int hashFunction2(int x); // hash function to map values to key

// modify hash.cpp
  
void Hash::insertItem(int key) 
{ 

	if(curr_size == this->BUCKET){
		cout << "Cannot insert " << key << " -- table is full" << endl;
		return;
	}
	int index = hashFunction(key); 

	cout << "Inserting: " << key << " at index: " << index << endl;
	// second hashing:
	int index2 = hashFunction2(key);
	int i = 0;
	while(true){
		int new_index = (index + i * index2) % (this->BUCKET); 
		
		if(table[new_index].empty()){
			table[new_index].push_back(key); 
			break;			
		}
		cout << new_index << " was occupied -- trying next index " << new_index << endl;
		i++;
	}
	curr_size++;
} 
  
void Hash::deleteItem(int key) 
{ 
	// get the hash index of key 
	int index = hashFunction(key); 
	int index2 = hashFunction2(key);
	int i = 0;
	while(true){
		int new_index = (index + i * index2) % (this->BUCKET); 
		
		if(*(table[new_index].begin()) == key){
			table[new_index].erase(table[new_index].begin());
			curr_size--;
			break;
						
		}
		i++;
	}
} 

void Hash::searchItem(int key) 
{ 

	int index = hashFunction(key); 
	int index2 = hashFunction2(key);
	int i = 0;
	while(true){
		int new_index = (index + i * index2) % (this->BUCKET); 
		
		if(*(table[new_index].begin()) == key){
			cout << "Item: " << key << " found at: " << new_index <<  endl; 
			break;		
		}
		i++;
	}
} 
  
int Hash::hashFunction2(int x) { 
	return 5 - (x % 5); 
} 

Chained Hashing

The biggest problem to open-address hashing is
- If the table is full, no more elements can be added.
- Similar to a vector, it could expand the capacity “under-the-hood” when needed, but…
  - All elements will probably have to be rehashed
  - New capacity shouldn’t be wasteful (too big) or too small
Chained hashing (chaining)
- If a collision occurs, then we store a series of data records in a list that the index in the hash table references.
- Linked Lists are a common collection to store collided data records.

Chaining

Example

// modify hash.cpp

void Hash::insertItem(int key) 
{ 
	int index = hashFunction(key); 
	table[index].push_back(key);  
	cout << "Inserting:" << key << " at index: " << index << endl;
} 
  
void Hash::deleteItem(int key) 
{ 
	// get the hash index of key 
	int index = hashFunction(key); 
	
	// find the key in (inex)th list 
	list <int> :: iterator i; 
	for (i = table[index].begin(); i != table[index].end(); i++) { 
		if (*i == key) break; 
	} 

  	// if key is found in hash table, remove it 
	if (i != table[index].end()) 
		table[index].erase(i); 
} 

void Hash::searchItem(int key) 
{ 
	// get the hash index of key 
	int index = hashFunction(key); 

	// find the key in (inex)th list 
	list <int> :: iterator i; 
	for (i = table[index].begin(); i != table[index].end(); i++) { 
		if (*i == key) break; 
	} 

  	// if key is found in hash table: 
	if (i != table[index].end()){
		cout << "Item: " << key << " found at: " << index <<  endl;
	}
	else{
		cout << "Item not found" << endl;
	}
} 
  

Useful visualization tool can be found in here