2. Runtime Complexity & Analysis

CS331 - 2021 Spring

Boris Glavic

bglavic@iit.edu

Runtime Analysis

To understand the behavior of an algorithm we want to understand how the runtime of the algorithm is affected by the size of its input
This is called runtime complexity

Empirical Runtime Analysis

Empirical measurements

Empirically measure runtime:
- Run our algorihms with different input sizes
- Measure runtime
- Try to fit empirically observed runtime to a function

Problems with empirical measurements

We only vary the input size
- runtime may also dependent on how adversarial the input is
- determining how characteristics of the input other then size affect runtime can be hard
We may observe random effects
- other process running on the machine kicks in
- OS decides to do some heavy task while we are measuring

Our results are machine- / language- / environment-dependent
- Our results may not translate well to other environments
It may be hard to determine the growth rate of our empirically measured results
- Multiple functions may fit our data with the some error
- Which one is the correct one?

Take-away

We want to reason about:
- how bad can it get (worst-case runtime)
- expectation (average-case runtime)
- best result we may get (best-case runtime)
We want to reason about runtime independent of the environment the algorithm is run in!

Problem Definition

Given an algorithm we want to find a function T(n) that computed the runtime of the algorithm for an input of size n
$T(n)$ = number of instruction the algorithm executes to compute the output
We want to do this for worst-case / average-case / best-case
In this class we will mostly focus on worst-case analysis

How to measure input size?

Number of bits
- independent of encoding
- works for everything
- not always intuitive

How to measure input size?

Number of elements in the input
- e.g., sort(array)
  - input is number of elements to sort
- correspondence to number of bits
  - assume that values are of fixed size

How to measure input size?

Size of an input element
- e.g., factorial(n)
  - input is the magnitude of the input number
- correspondence to number of bits
  - measure the numbers size in bits

What about algorithms that have multiple inputs

e.g., gcd(m,n) - find greatest common divisor of m and n
- Option 1: $max(n,m)$ - the largest input
- Option 2: $max(n,c)$ - only vary one input and consider others as constant
  - get potentially different runtime behavior for different inputs (not for gcd though)
- ..here probably option 1

example for option 2:
- find(str,doc) find occurances of string str in doc

Big-O Notation

How to reason about runtime independent of an environment?

How to think about T(n)?

def sum(l: list):       # input size: n = len(l)
    sum = 0             # cost: c1  #executions: 1
    for x in l:         # cost: c2  #executions: n
        sum += x        # cost: c3  #executions: n
    print(sum)          # cost: c4  #executions: 1

$$T(n) = c_1 \cdot 1 + c_2 \cdot n + c_3 \cdot n + c_4 \cdot 1$$ $$= n \cdot (c_2 + c_3) + c_1 + c_4$$

How to reason about runtime independent of an environment?

The constant costs of executing a statement are environment-specific
- Let’s ignore them! $$T(n) = c_1 * 1 + c_2 * n + c_3 * n + c_4 * 1$$ reduces to $$T(n) = 2n + 2$$

How to reason about runtime independent of an environment?

Asymtotically (when we continue to increase the input size) only the term with the greatest growth rate determines the runtime

n = 1   => T(n) = 2*1   + 2  = 3
n = 10  => T(n) = 2*10  + 2 = 12
n = 100 => T(n) = 2*100 + 2 = 102

This behaves like $2*n$
- or $n$ if we ignore constants

Asymptotic Behavior of Functions

Big-O notation
- a function $f(x): \mathbb{N} \to \mathbb{N}$ is in $O(g(x))$ for a function $g(x)$ if there exist constants $n_0$ and $c$ such that for all $x > n_0$ we have $f(x) < c \cdot g(x)$
Inuitively $f(x) = O(g(x))$ means $g$ grows faster than $f$

Asymtotic Runtime of Sum

def sum(l: list):       # input size: n = len(l)
    sum = 0             # cost: c1  #executions: 1
    for x in l:         # cost: c2  #executions: n
        sum += x        # cost: c3  #executions: n
    print(sum)          # cost: c4  #executions: 1

$$T(n) = O(n)$$

Proof sketch: e.g., choose $g(n)=n$ and $c=3$ and $n_0=4$

f(4) = 2*4+2 = 10 < 12 = g(4)
f(5) = 2*5+2 = 12 < 15 = g(5)
...

Note that!

runtime is algorithm-specific not problem-specific
Example: computing Fibonnaci numbers: $$F_0 = F_1 = 1$$ $$F_n = F_{n-1} + F_{n-2}$$

F2 = 1 + 1 = 2
F3 = 2 + 1 = 3
F4 = 3 + 2 = 5
F5 = 5 + 3 = 8
...

Dumb fibonacci(n) with $O(2^n)$ runtime

def fibonacci(n):
    if n == 0 or n == 1:
        return 1
    else
        return fibonacci(n-1) + fibonacci(n-2)

#calls to compute F0 is 1 = F0
#calls to compute F1 is 1 = F1
#calls to compute F2 is 3 = #calls(F1) + #calls(F0) + 1
#calls to compute F3 is 5 = #calls(F2) + #calls(F1) + 1
#calls to compute F4 is 9 = #calls(F3) + #calls(F2) + 1

same growth rate as Fibonacci sequence which is $O(2^n)$!

Smart fibonacci(n) with $O(n)$ runtime

def fibonacci(n):
    f = [ 1, 1 ]
    for m in range(2,n+1):
        f.append(f[m-1] + f[m-2])
    return f[n]

Example Analysis: Insertion Sort

The Sorting Problem

Given a list of $n$ items, sort the list according to a total order $\leq$ for the elements

Input   = [5, 3, 9, 1, 8]
Output = [1, 3, 5, 8, 9]

Input size: $n$
Sorting algorithms have been studied since the dawn of Computer Science and even before that!

Insertion Sort

We will learn about several sorting algorithms in this course
For the sake of runtime analysis let us consider Insertion Sort as an example of such algorithms

Insertion Sort

Let’s break the problem into smaller, more managable parts
We can divide the problem of sorting a list of $n$ elements into two parts:
1. sort the first $n-1$ elements
2. insert the $n^{th}$ element in the right position in sort order

Insertion Sort - Example

Input: [5, 3, 8, 1, 9]
First $n-1$ elements sorted: [1, 3, 5, 9]
Insert $n^{th}$ element 8 at the right position (before 9): [1, 3, 5, 8, 9]
But how to we sort the first $n-1$ elements?
- Apply rule recursively, we split the list of $n-1$ elements into a list of length $n-2$ and a final element

Insertion Sort

We implement insertion sort using a counter that keeps track of the prefix of the list that is sorted. Once this counter reaches $n$ we are done (the full list is sorted):

def insertion_sort(lst):
    for i in range(1, len(lst)):
        # find position of lst[i] in sorted prefix [0:i-1]

Insertion Sort

How do we find position of lst[i] in sorted lst[0:i-1]?
Trickle down lst[i] by comparing it with it predecessor until the predecessor is smaller

def insertion_sort(lst):
    for i in range(1, len(lst)): # after i iterations the first i elements are sorted
        for j in range(i, 0, -1): # trickle the ith element down to its position withing this sorted list
            if lst[j] < lst[j-1]:
                lst[j], lst[j-1] = lst[j-1], lst[j]
            else:
                break # found final position of element

Runtime analysis

Let’s do worst case runtime!

def insertion_sort(lst):
  for i in range(1, len(lst)):              # n-1
    for j in range(i, 0, -1):               # sum(i)
      if lst[j] < lst[j-1]:                 # sum(i)
        lst[j], lst[j-1] = lst[j-1], lst[j] # < sum(i)
      else:                                 # < n
        break                               # < n

$$T(n) \leq n-1 + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + n + n$$

Runtime analysis

$$T(n) \leq n-1 + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + \sum_{i=1}^{n} i + n + n$$

$$= 3 \cdot \sum_{i=1}^{n} i + 3n -1$$

Recall that $\sum_{i=1}^n i = \frac{n \cdot (n+1)}{2}$

$$T(n) \leq 3 \frac{n \cdot (n+1)}{2} + 3n -1$$

Runtime analysis

This is quadratic growth

$$T(n) = O(n^2)$$

Example Analysis: Linear and Binary Search

Searching in a Sequence

Input: list l of length n and element e to search for
Output: true if e is in l and false otherwise

Linear Search

Go through the list sequentially and inspect every element. Stop once the end of the list is reached or the element has been found

def linear_search(lst, e):
    for x in lst:
        if x == e:
            return True
    return False

Runtime Analysis

What is the worst case?
- => if element is not in the list
In this case: iterate through whole list
=> $O(n)$

def linear_search(lst, e):
    for x in lst:
        if x == e:
            return True
    return False

Binary Search

Input is assumed to the sorted
Compare middle element l[mid] of the list with e:
- if l[mid] == e then return True
- if l[mid] > e then recursively search in l[:mid]
- if l[mid] < e then recursively search in l[mid+1:]

Binary Search

def binary_search(lst, e):
    low = 0
    hi  = len(lst)
    mid = (low + hi) // 2
    while lst[mid] != e and low <= hi:
        if lst[mid] < e:
            low = mid + 1
        else:
            hi  = mid - 1
        mid = (low + hi) // 2
    if lst[mid] == e:
        return True
    else:
        raise False

Runtime Analysis

how often is the main loop executed?
in each loop interation we reduce the distanced of hi - low by a factor of ~2
how long does it take us to get to low=high in the worst-case?
=> $O(\log n)$

while lst[mid] != e and low <= hi:
    if lst[mid] < e:
        low = mid + 1
    else:
        hi  = mid - 1
    mid = (low + hi) // 2