Implementing a Hash Set Data Structure in Python

Introduction

A Hash Set is a data structure that stores unique elements in no particular order. Just like a hash map, a hash set uses a hash table for storage, but it only stores keys, not key-value pairs. In this blog post, we’ll discuss the hash set data structure and implement one from scratch in Python.

Hash Set Structure

A hash set stores unique elements in an array, usually referred to as “buckets”. A hash function is used to map each element to an index in this array. Since it only stores keys, you can think of it as a simplified hash map.

Example:

Hash Set:
  ---------------
  | 0 | -> | 1 |
  ---------------
  | 1 | -> | 3 | -> | 9 |
  ---------------
  | 2 | -> | 2 |
  ---------------
  | 3 | -> | 4 |
  ---------------
  | 4 | -> | 8 |
  ---------------

In the above example, the hash set has 5 buckets. Each bucket can contain zero or more elements, stored as linked lists.

Python Implementation

Let’s implement a basic hash set in Python. Similar to our hash map implementation, we’ll use Python lists and custom linked list nodes.

Node Class

First, we’ll implement a Node class to hold elements.

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None

HashSet Class

Now, let’s build the HashSet class with essential functionalities.

class HashSet:
    def __init__(self, size):
        self.size = size
        self.set = [None] * self.size

    def _hash(self, val):
        return hash(val) % self.size

    def add(self, val):
        val_hash = self._hash(val)
        if self.set[val_hash] is None:
            self.set[val_hash] = Node(val)
        else:
            cur_node = self.set[val_hash]
            while cur_node:
                if cur_node.val == val:
                    return
                if cur_node.next is None:
                    cur_node.next = Node(val)
                    return
                cur_node = cur_node.next

    def contains(self, val):
        val_hash = self._hash(val)
        if self.set[val_hash] is not None:
            cur_node = self.set[val_hash]
            while cur_node:
                if cur_node.val == val:
                    return True
                cur_node = cur_node.next
        return False

    def remove(self, val):
        val_hash = self._hash(val)
        if self.set[val_hash] is None:
            return
        cur_node = prev_node = self.set[val_hash]
        while cur_node:
            if cur_node.val == val:
                if cur_node == prev_node:
                    self.set[val_hash] = cur_node.next
                else:
                    prev_node.next = cur_node.next
                return
            prev_node = cur_node
            cur_node = cur_node.next

Method Details:

  • add: Adds a new element to the hash set if it doesn’t already exist.
  • contains: Checks if an element exists in the hash set.
  • remove: Removes an element from the hash set.

Putting It All Together

Here’s an example demonstrating how to use our HashSet class.

# Create a new HashSet
h = HashSet(5)

# Add elements
h.add(1)
h.add(3)
h.add(2)

# Check existence
print(h.contains(1))  # Output: True
print(h.contains(4))  # Output: False

# Remove an element
h.remove(1)
print(h.contains(1))  # Output: False

Using Python’s In-Built HashSet

Python has a built-in hash set data structure called set. It’s extremely simple to use and optimized for performance:

# Create a new set
my_set = set()

# Add elements
my_set.add(1)
my_set.add(2)
my_set.add(3)

# Check existence
print(1 in my_set)  # Output: True

# Remove an element
my_set.remove(1)
print(1 in my_set)  # Output: False

If you find this guide useful, please share with others, and spread the knowledge!

Leave a Comment

Your email address will not be published. Required fields are marked *