5. Maps, Sets & Hashtables

CS331 - 2021 Spring

Boris Glavic

bglavic@iit.edu

Maps

The Map ADT

  • A map is a collection data structure associates keys with values
  • put(k,v) - associate key k with value v in the map
  • get(k) - return the value v associated with key k

Examples

  • map students to their majors:
    • { Peter -> CS, Alice -> CS, Bob -> CS }
  • record the number of occurrences of a character in a string:
    • abba would be encoded as { 'a' -> 2, 'b' -> 2 }

Map Implementations

  • we will discuss two instances of the map ADT:
    • hashtables: will be discussed next
    • binary search trees: we will discuss these later

Hashtables

Motivation

  • Consider a map with a key domain that is an integer range [0,n)
    • e.g., only the integers 0 to 9 are valid keys
  • We can allocate an array buckets of size n
    • each possible key has one entry in the array
  • The value v associated with a key k is stored in buckets[k]
    • we can use None to indicate that a key k is not in the map

Naive Array-backed Map Example

  • Consider the map { 1 -> 'a', 3 -> 'b', 5 -> 'c' } with a keyspace [0,6)
  • We can encode this map as the following array:
    •   `[ None, 'a', None, 'b', None, 'c' ]`
      
    •   `[  0  ,  1 ,  2  ,  3 ,  4  ,  5  ]`
      
  • put(k,v): buckets[k] = v
  • get(k): buckets[k]

Discussion

  • put and get are O(1) using this implementation
  • this only works for small keyspaces that are integers [0,n]
  • how to deal with large and/or non-integer keyspaces, e.g., strings?
    • => need to map keys into an integer range!

Hash Functions

  • A hash function h maps keys from a keyspace into a domain [0,n]

  • We can use hash functions to extend our array based idea for other keyspaces

  • Example:

    • keyspace: [100000,5000000]
    • target space: [0,19]
    • h(x) = x % 20

Hashtables

  • A hashtable is an array of size n and a hashfunction h: keys -> [0,n)
    • the elements of the array are called buckets
  • put(k,v) = buckets[h(k)] = v
  • get(k) = buckets[h(k)]

Collisions

  • The number of buckets is typically much smaller than the size of the key space
  • => we have to deal with collisions where multiple keys map to the same buckets
    • this is the case if h(k1) = h(k2) for two keys k1 and k2
  • how do we know which key is stored in a bucket?
    • need to store keys in the buckets too!
  • how do we deal with the case when we want to store both k1 and k2 with h(k1) = h(k2) in the hashtable
    • need to allow multiple buckets as the “place” for a key => open addressing
    • need to store multiple key-value pairs per buckets => chaining

Chaining

  • Instead of having a single entry per bucket, make each bucket a list of (key,value) pairs
  • Example
    • h(x) = x % 4
    • { 5 -> 'a', 9 -> 'b', 0 -> 'c' }
[ 0 ] -> (0,'c')
[ 1 ] ->
[ 2 ] -> (5, 'a') -> (9, 'b')
[ 3 ] ->

Operations with chaining

  • get operation
def get(k):
    lc = buckets[h(k)]
    while lc:
        if lc.key == k:
            return lc.val
        lc = lc.next
    return None

Operations with chaining

  • put operation
def put(k,v):
    lc = buckets[h(k)]
    while lc:
        if lc.key == k:
            lc.val = v
            return
    newlc = ListCell(k,v,buckets[h(k)])
    buckets[h(k)] = newlc

Complexity

Hash Function Considerations

  • What makes a function a good hash function?

Sets

The Set ADT

  • A set is an unordered collection of elements
  • In contrast to lists or arrays, there are no duplicates
    • e.g., contrast [1,2,3,2,3,3] with {1,2,3}

Hashsets

  • A common way to implement sets is to use a hashtable.
  • For the set we only care about which keys are in the hashtable
    • => we can use a fixed value that we are associating with all keys (elements of the set) in the hashtable

Hashset Example

  • let e be the element we use as value for every key
  • the set {1,2,3}
  • is encoded as { 1 -> e, 2 -> e, 3 -> e }

Hashset Complexity

  • the add, contains and delete operations are in $O(1)$ for hashsets

Recap

Recap

  • the Map ADT
    • hashtables as an efficient implementation
  • the Set ADT
    • hashtables as an efficient implementation