3. List Datastructures

CS331 - 2021 Spring

Boris Glavic

bglavic@iit.edu

The List ADT

Abstract Data Type

  • An abstract data type describes the behavior of a data type independent of implementation
  • for an ADT we specify:
    • supported operations (API) and their semantics
    • possibly restrictions on computational complexity

Lists Operations

A list stores a sequence of $n$ elements. Supported operations:

  • len(l): return the number of items in the list
  • l[i]: return the item at position i
  • l[i] = e: replace the item at position i
  • l.prepend(e): adds e to the beginning of the list
  • l.append(e): appends e to the end of the list
  • l1 + l2: concatenates two lists
  • del l[i]: delete the element at position i

List Operations - Examples

  • len([1,2,3])
    • => 3
  • l = [1,2,3,6,7]; l[3]
    • => 6
  • l = [1,2,3]; l[1] = 15
    • => [1,15,3]
  • l = [1,2,3]; del l[1]
    • => [1,3]

List Operations - Examples

  • [1,2,3] + [4,1,3]
    • => [1,2,3,4,1,3]
  • l = [1,2,3,6,7]; l.prepend(15)
    • => [15,1,2,3,6,7]
  • l = [1,2,3,6,7]; l.append(15)
    • => [1,2,3,6,7,15]

Array-backed Lists

Agenda

  • Introduce arrays
  • Discuss implementation of lists using array
  • Discuss runtime complexity
  • Discuss optimizations

Arrays

  • Arrays are sequences of a fixed size
  • Elements of arrays can be modified, but the length of the array can not be changed after creation
  • Supported operations
    • len(a) - return the number of elements in the array
    • l[i] - return element at position i
    • l[i] = e - replace the element at position i with e

Arrays

  • Arrays are expected to be laid out as consecutive areas in memory
  • => accessing and modifying an element is in $O(1)$

Storing Lists in Arrays

  • An arrays are sequences => we can use them to store lists
  • However, the lists operations may change the length of the lists, but arrays have a fixed size
    • solution: when the lists outgrows the array during an append operation:
      1. allocate a new larger array
      2. copy over elements to the new array
      3. append the new element

Implementation

class ArrayList:
    def __init__(self,n=10):
        self.data = MyArray(n)
        self.len = 0

    def extend(self, newsize):
        newa = MyArray(newsize)
        for i in range(0, self.len):
            newa[i] = self.data[i]
        self.data = newa

    def __len__(self):
        return self.len

Implementation

    def __getitem__(self, idx):
        return self.data[idx]

    def __setitem__(self, idx, value):
        self.data[idx] = value

    def append(self, value):
        if len(self.data) <= self.len:
            self.extend(len(self.data) + 1)
        self.data[self.len] = value
        self.len += 1

Implementation

    def prepend(self, value):
        if len(self.data) <= self.len:
            self.extend(len(self.data) + 1)
        for i in range(0, self.len):
            self.data[i+1] = self.data[i]
        self.data[0] = value
        self.len += 1

    def __delitem__(self, idx):
        for i in range(idx+1, self.len):
            self.data[i-1] = self.data[i]
        self.len += -1

Worst-case Runtime Complexity

  • len(l), l[i], l[i] = e
    • all in $O(1)$ (this is an array!)
  • extend
    • $O(n)$ where $n$ is the new size
    • have to copy each item and allocate new array of size $n$
  • l.append(e):
    • O(n) because of extend

Worst-case Runtime Complexity

  • l.prepend(e):
    • O(n) because of extend
    • also requires $O(n)$ items to be moved
  • del l[i]: delete the element at position i
    • worst case is deleting the first element => $O(n)$ items need to be moved

Improving Append and Prepend

  • instead of growing the array by one item at a time, we can choose a different growth rate
  • if we always double the size, then to append $n$ items we only need $log(n)$ copies

Updated Implementation (append)

    def append(self, value):
        if len(self.data) <= self.len:
            self.extend(min(1,2 * len(self.data)))
        self.data[self.len] = value
        self.len += 1

Amortized Runtime Analysis

  • sometimes executions of an operation may vary widely in their runtime
  • in this cases, worst-case runtime analysis is not providing us with a good understanding of the runtime behavior of our algorithm
  • amortized runtime analysis:
    • reason about the average runtime per operation given the worst-case runtime for executing a sequence of n operations

Amortized Runtime for Append

  • starting from an empty list we append $n$ items
  • assume for now that $n = 2^m$ for some $m$
  • as mentioned before this will result in $log(n) = m$ extend call of sizes
    • 1,2,4,8,…,n/2

Amortized Runtime for Append

$$T(n) = \sum_{i=0}^{i} 2^i$$ $= 2^{m+1} - 1$ (proof to follow) $$= 2 n - 1 = O(n)$$

Amortized Runtime for Append

  • What about when $n$ is not a power of $n$
  • We can pick the next power of two which is less than $2*n$
  • => the asymptotic runtime is not affected

Proof of $\sum_{i=0}^{m} 2^i = 2^{m+1} - 1$

  • We proof this claim by induction

  • Base case: $m=0$: $$\sum_{i=0}^{0} 2^i = 2^0 = 1 = 2^{2} - 1$$

Proof of $\sum_{i=0}^{m} 2^i = 2^{m+1} - 1$

  • Induction step:
    • induction hypothesis: the claim holds for $m=n$. We need to show that this implies that it holds for $m=n+1$ $$\sum_{i=0}^{n+1} 2^i = \left(\sum_{i=0}^{n} 2^i\right) + 2^{n+1}$$ $= 2^{n+1} - 1 + 2^{n+1}$ (by induction hypothesis) $$=2 \cdot 2^{n+1} + 1 = 2^{n+2}$$

Linked Lists

Linked Structures

  • Instead of using a monolithic sequence like an array we can implement lists using cells that store single values and are linked together to record the order of elements in the list

List Cells

  • We can implement such a list cell as class in Python:
    • field val store the value
    • field next stores the next list cell in sequence
class ListCell:

    def __init__(self,val,nxt):
        self.val = val
        self.next = nxt

Constructing Lists from Cells

  • We can construct a list by creating list cells and linking them together
# the list [0,1,2]

# create list cells to hold the values
l1 = ListCell(0,None)
l2 = ListCell(1,None)
l3 = ListCell(2,None)

# link them together
l1.next = l2
l2.next = l3
  • alternatively we can directly pass the following list cell to the constructor:
head = ListCell(0,ListCell(1,ListCell(2,None)))

Lists as a recursive data type

  • note that defining lists this way is recursive:
  • a list is either …
    • .. the empty list []
    • or a value v followed by a list

A Linked List

  • we need to keep track of which list cell is the first element (the head) of the list
class List:

    class ListCell:
        def __init__(self,val,nxt):
            self.val = val
            self.next = nxt

    def __init__(self):
        self.head = None
        self.len = 0

Linked List Operations - Prepend

  • We just have to create a new cell to hold the inserted value
  • The new cell becomes the new head of the list
  • This is $O(1)$!
def prepend(self,val):
    l = self.ListCell(val,self.head)
    self.head = l
    self.len += 1

Linked List Operations - Index Access

  • we only have access to the first element of the list
  • to find the list cell for the ith element we have to follow i-1 links
  • we ignore validity checks (the position exists in the list) here
def __getitem__(self,idx):
    el = self.head
    for i in range(1,idx):
        el = el.next
    return el.val

Linked List Helpers - Get ith list cell

  • for the following operations it will be useful to have a method the returns the ith cell in the list
  • the worst-case complexity $O(n)$ (have to go through all elements in the list)
def get_cell(self,idx):
    el = self.head
    for i in range(1,idx):
        el = el.next
    return el

Linked List Operations - Delete

  • since the list is no longer monolithic, we can delete a cell by letting its predecessor point to its successor
  • again we do not sanity check the parameters here
  • $O(n)$ (based on get_cell)
def __delitem__(self, idx):
    self.len += -1
    if idx == 0:
        self.head = self.head.next
    else:
        el = self.get_cell(idx-1)
        el.next = el.next.next

Linked List Operations - Insert & Append

  • insert & append can be implemented like delete by using get_cell
  • these operations are in $O(n)$
def insert(self, idx, val):
    self.len += 1
    el = self.get_cell(idx)
    newel = self.ListCell(val,el.next)
    el.next = newel

Doubly Linked Lists

  • we can enable more flexible navigation (forwards and backwards) by extending list cells to point to their predecessor.
class DoublyLinkedCell:

    def __init__(self,val,nxt,prev):
        self.val = val
        self.nxt = nxt
        self.prev = prev

Circular Lists

  • if we make doubly-linked list circular (the next element of the last element is the head) then this gives us efficient access to the last element of the list through head.prev
    • append is now $O(1)$

Runtime Complexity Summary

Operation Array-list Linked List
prepend $O(n)$ $O(1)$
append $O(n)$ (amortized) $O(1)$ (doubly-linked)
insert $O(n)$ $O(n)$
index-access $O(1)$ $O(n)$

Runtime Complexity Summary

Operation Array-list Linked List
delete element $O(n)$ $O(n)$
extend $O(n)$ $O(1)$ (doubly-linked)
length $O(1)$ $O(1)$