2. Runtime Complexity & Analysis

CS331 - 2021 Spring

Boris Glavic

bglavic@iit.edu

Runtime Analysis

  • To understand the behavior of an algorithm we want to understand how the runtime of the algorithm is affected by the size of its input
  • This is called runtime complexity

Empirical Runtime Analysis

Empirical measurements

  • Empirically measure runtime:
    • Run our algorihms with different input sizes
    • Measure runtime
    • Try to fit empirically observed runtime to a function

Problems with empirical measurements

  • We only vary the input size
    • runtime may also dependent on how adversarial the input is
    • determining how characteristics of the input other then size affect runtime can be hard
  • We may observe random effects
    • other process running on the machine kicks in
    • OS decides to do some heavy task while we are measuring
  • Our results are machine- / language- / environment-dependent
    • Our results may not translate well to other environments
  • It may be hard to determine the growth rate of our empirically measured results
    • Multiple functions may fit our data with the some error
    • Which one is the correct one?

Take-away

  • We want to reason about:
    • how bad can it get (worst-case runtime)
    • expectation (average-case runtime)
    • best result we may get (best-case runtime)
  • We want to reason about runtime independent of the environment the algorithm is run in!

Problem Definition

  • Given an algorithm we want to find a function T(n) that computed the runtime of the algorithm for an input of size n
  • $T(n)$ = number of instruction the algorithm executes to compute the output
  • We want to do this for worst-case / average-case / best-case
  • In this class we will mostly focus on worst-case analysis

How to measure input size?

  • Number of bits
    • independent of encoding
    • works for everything
    • not always intuitive

How to measure input size?

  • Number of elements in the input
    • e.g., sort(array)
      • input is number of elements to sort
    • correspondence to number of bits
      • assume that values are of fixed size

How to measure input size?

  • Size of an input element
    • e.g., factorial(n)
      • input is the magnitude of the input number
    • correspondence to number of bits
      • measure the numbers size in bits

What about algorithms that have multiple inputs

  • e.g., gcd(m,n) - find greatest common divisor of m and n
    • Option 1: $max(n,m)$ - the largest input
    • Option 2: $max(n,c)$ - only vary one input and consider others as constant
      • get potentially different runtime behavior for different inputs (not for gcd though)
    • ..here probably option 1
  • example for option 2:
    • find(str,doc) find occurances of string str in doc

Big-O Notation

How to reason about runtime independent of an environment?

  • How to think about T(n)?
def sum(l: list):       # input size: n = len(l)
    sum = 0             # cost: c1  #executions: 1
    for x in l:         # cost: c2  #executions: n
        sum += x        # cost: c3  #executions: n
    print(sum)          # cost: c4  #executions: 1

$$T(n) = c_1 \cdot 1 + c_2 \cdot n + c_3 \cdot n + c_4 \cdot 1$$ $$= n \cdot (c_2 + c_3) + c_1 + c_4$$

How to reason about runtime independent of an environment?

  • The constant costs of executing a statement are environment-specific
    • Let’s ignore them! $$T(n) = c_1 * 1 + c_2 * n + c_3 * n + c_4 * 1$$ reduces to $$T(n) = 2n + 2$$

How to reason about runtime independent of an environment?

  • Asymtotically (when we continue to increase the input size) only the term with the greatest growth rate determines the runtime
n = 1   => T(n) = 2*1   + 2  = 3
n = 10  => T(n) = 2*10  + 2 = 12
n = 100 => T(n) = 2*100 + 2 = 102
  • This behaves like $2*n$
    • or $n$ if we ignore constants

Asymptotic Behavior of Functions

  • Big-O notation
    • a function $f(x): \mathbb{N} \to \mathbb{N}$ is in $O(g(x))$ for a function $g(x)$ if there exist constants $n_0$ and $c$ such that for all $x > n_0$ we have $f(x) < c \cdot g(x)$
  • Inuitively $f(x) = O(g(x))$ means $g$ grows faster than $f$

Asymtotic Runtime of Sum

def sum(l: list):       # input size: n = len(l)
    sum = 0             # cost: c1  #executions: 1
    for x in l:         # cost: c2  #executions: n
        sum += x        # cost: c3  #executions: n
    print(sum)          # cost: c4  #executions: 1

$$T(n) = O(n)$$

  • Proof sketch: e.g., choose $g(n)=n$ and $c=3$ and $n_0=4$
f(4) = 2*4+2 = 10 < 12 = g(4)
f(5) = 2*5+2 = 12 < 15 = g(5)
...

Note that!

  • runtime is algorithm-specific not problem-specific
  • Example: computing Fibonnaci numbers: $$F_0 = F_1 = 1$$ $$F_n = F_{n-1} + F_{n-2}$$
F2 = 1 + 1 = 2
F3 = 2 + 1 = 3
F4 = 3 + 2 = 5
F5 = 5 + 3 = 8
...

Dumb fibonacci(n) with $O(2^n)$ runtime

def fibonacci(n):
    if n == 0 or n == 1:
        return 1
    else
        return fibonacci(n-1) + fibonacci(n-2)
#calls to compute F0 is 1 = F0
#calls to compute F1 is 1 = F1
#calls to compute F2 is 3 = #calls(F1) + #calls(F0) + 1
#calls to compute F3 is 5 = #calls(F2) + #calls(F1) + 1
#calls to compute F4 is 9 = #calls(F3) + #calls(F2) + 1
  • same growth rate as Fibonacci sequence which is $O(2^n)$!

Smart fibonacci(n) with $O(n)$ runtime

def fibonacci(n):
    f = [ 1, 1 ]
    for m in range(2,n+1):
        f.append(f[m-1] + f[m-2])
    return f[n]

Example Analysis: Insertion Sort

The Sorting Problem

  • Given a list of $n$ items, sort the list according to a total order $\leq$ for the elements
Input   = [5, 3, 9, 1, 8]
Output = [1, 3, 5, 8, 9]
  • Input size: $n$
  • Sorting algorithms have been studied since the dawn of Computer Science and even before that!

Insertion Sort

  • We will learn about several sorting algorithms in this course
  • For the sake of runtime analysis let us consider Insertion Sort as an example of such algorithms

Insertion Sort

  • Let’s break the problem into smaller, more managable parts
  • We can divide the problem of sorting a list of $n$ elements into two parts:
    1. sort the first $n-1$ elements
    2. insert the $n^{th}$ element in the right position in sort order

Insertion Sort - Example

  • Input: [5, 3, 8, 1, 9]
  • First $n-1$ elements sorted: [1, 3, 5, 9]
  • Insert $n^{th}$ element 8 at the right position (before 9): [1, 3, 5, 8, 9]
  • But how to we sort the first $n-1$ elements?
    • Apply rule recursively, we split the list of $n-1$ elements into a list of length $n-2$ and a final element

Insertion Sort

  • We implement insertion sort using a counter that keeps track of the prefix of the list that is sorted. Once this counter reaches $n$ we are done (the full list is sorted):
def insertion_sort(lst):
    for i in range(1, len(lst)):
        # find position of lst[i] in sorted prefix [0:i-1]

Insertion Sort

  • How do we find position of lst[i] in sorted lst[0:i-1]?
  • Trickle down lst[i] by comparing it with it predecessor until the predecessor is smaller
def insertion_sort(lst):
    for i in range(1, len(lst)): # after i iterations the first i elements are sorted
        for j in range(i, 0, -1): # trickle the ith element down to its position withing this sorted list
            if lst[j] < lst[j-1]:
                lst[j], lst[j-1] = lst[j-1], lst[j]
            else:
                break # found final position of element

Runtime analysis

  • Let’s do worst case runtime!
def insertion_sort(lst):
  for i in range(1, len(lst)):              # n-1
    for j in range(i, 0, -1):               # sum(i)
      if lst[j] < lst[j-1]:                 # sum(i)
        lst[j], lst[j-1] = lst[j-1], lst[j] # < sum(i)
      else:                                 # < n
        break                               # < n

$$T(n) \leq n-1 + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + n + n$$

Runtime analysis

$$T(n) \leq n-1 + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + n + n$$

$$= 3 \cdot \sum_{i=1}^{n} i + 3n -1$$

  • Recall that $\sum_{i=1}^n i = \frac{n \cdot (n+1)}{2}$

$$T(n) \leq 3 \frac{n \cdot (n+1)}{2} + 3n -1$$

Runtime analysis

  • This is quadratic growth

$$T(n) = O(n^2)$$

Searching in a Sequence

  • Input: list l of length n and element e to search for
  • Output: true if e is in l and false otherwise
  • Go through the list sequentially and inspect every element. Stop once the end of the list is reached or the element has been found
def linear_search(lst, e):
    for x in lst:
        if x == e:
            return True
    return False

Runtime Analysis

  • What is the worst case?
    • => if element is not in the list
  • In this case: iterate through whole list
  • => $O(n)$
def linear_search(lst, e):
    for x in lst:
        if x == e:
            return True
    return False
  • Input is assumed to the sorted
  • Compare middle element l[mid] of the list with e:
    • if l[mid] == e then return True
    • if l[mid] > e then recursively search in l[:mid]
    • if l[mid] < e then recursively search in l[mid+1:]

Binary Search

def binary_search(lst, e):
    low = 0
    hi  = len(lst)
    mid = (low + hi) // 2
    while lst[mid] != e and low <= hi:
        if lst[mid] < e:
            low = mid + 1
        else:
            hi  = mid - 1
        mid = (low + hi) // 2
    if lst[mid] == e:
        return True
    else:
        raise False

Runtime Analysis

  • how often is the main loop executed?
  • in each loop interation we reduce the distanced of hi - low by a factor of ~2
  • how long does it take us to get to low=high in the worst-case?
  • => $O(\log n)$
while lst[mid] != e and low <= hi:
    if lst[mid] < e:
        low = mid + 1
    else:
        hi  = mid - 1
    mid = (low + hi) // 2