CSE-250 Fall 2022 - Section B - ArrayBuffers, Amortized Analysis

Announcements

PA 1 due tonight!
WA 1 posted (Due Weds, Sept 28)

Abstract Data Type (ADT): The interface to a data structure (What)
Data Structure: The implementation of one or more ADTs (How)

Types of Collections

Iterable: Any collection of items.
Seq: A collection of items arranged in a specific order.
IndexedSeq: Like Seq, but $O(1)$ access to individual items by index.
Set: A collection of unique items.
Map: A collection of items identified by a key.

Types of Collections

mutable.Seq: Like Seq, but can be changed.
mutable.Buffer: Like mutable.Seq, but "efficient" append.
Queue: Like mutable.Seq, but "efficient" append and remove first.
Stack: Like mutable.Seq, but "efficient" prepend and remove first.

The `mutable.Seq` ADT

apply(idx: Int): A

Get the element (of type A) at position idx.

iterator: Iterator[A]

Get access to

view all elements in the seq, in order, once.

length: Int

Count the number of elements in the seq.

insert(idx: Int, elem: A): Unit

Insert an element at position idx with value elem.

remove(idx: Int): A

Remove the element at position idx and return the removed value.

Array[T] : Seq[T]

An Array of $n$ items of type T:

size: 4 bytes for $\texttt{sizeof}(\texttt{data})$ (optional).
bytesPerElement: 4 bytes for $\texttt{sizeof}(\texttt{T})$ (optional).
data: $\texttt{size} \times \texttt{bytesPerElement}$ bytes of memory.

Challenge: Operations that modify the array size
require copying the array.

Solution Reserve extra space in the array!

ArrayBuffer[T] : Buffer[T] ( : Seq[T] )

An ArrayBuffer of type T:

size: 4 bytes for $\texttt{sizeof}(\texttt{data})$ (optional).
bytesPerElement: 4 bytes for $\texttt{sizeof}(\texttt{T})$ (optional).
used: 4 bytes for the number of fields $used$
data: $\texttt{size} \times \texttt{bytesPerElement}$ bytes of memory.

ArrayBuffer


  class ArrayBuffer[T] extends Buffer[T]
  {
    var used = 0
    var data = Array[Option[T]].fill(INITIAL_SIZE) { None }

    def length = used

    def apply(i: Int): T =
    {
      if(i < 0 || i >= used){ throw new IndexOutOfBoundsException(i) }
      return data(i).get
    }

    /* ... */
  }

What the heck is Option[T]?

Option[T]


    val x = functionThatCanReturnNull()
    x.frobulate()

java.lang.NullPointerException (in production)

Option[T]


    val x = functionThatCanReturnNull()
    if(x == null) { handle this case }
    else { x.frobulate() }

Problem: It's easy to miss this test
(and bring down a million-dollar server)!

Option[T]


    val x = functionThatReturnsOption()
    x.frobulate()

error: value frobulate is not a member of Option[MyClass]
At compile time.

Option[T]

Some(x): value.isDefined == true; A valid value. Access with value.get
None: value.isEmpty == true; Analogous to null. No value

Bonus: an Option[T] is a Seq[T]

Digression over!

ArrayBuffer.remove(i)


    def remove(target: Int): T =
    {
      /* Sanity-check inputs */
      if(target < 0 || target >= used){ 
        throw new IndexOutOfBoundsException(target) }
      /* Shift elements left */
      for(i <- target until (used-1)){
        data(i) = data(i+1)
      }
      /* Update metadata */
      data(used-1) = None
      used -= 1
    }

What is the complexity?

$O(\texttt{data.size})$ (i.e., $O(n)$ ) or $\Theta(used-target)$

$T_{remove}(n) = \begin{cases} 1 & \textbf{if } target = used-1\\ 2 & \textbf{if } target = used-2\\ 3 & \textbf{if } target = used-3\\ ... & ...\\ n-1 & \textbf{if } target = 0 \end{cases}$

$T_{remove}(n)$ is $O(n)$ and $\Omega(1)$
(these bounds are "tight")

We usually parameterize runtime complexity by datastructure size, we can measure runtime in terms of other parameters (e.g., used and i).

ArrayBuffer.append(elem)


    def append(elem: T): Unit =
    {
      if(used == data.size){ /* 🙁 case */
        /* assume newLength > data.size, but pick it later */
        val newData = Array.copyOf(original = data, newLength = ???)
        /* Array.copyOf doesn't init elements, so we have to */
        for(i <- data.size until newData.size){ newData(i) = None }
      }
      /* Append element, update data and metadata */
      newData(used) = Some(elem)
      data = newData
      used += 1
    }

What is the complexity?

$O(\texttt{data.size})$ (i.e., $O(n)$ ) ... but ...

ArrayBuffer.append(elem)

$T_{append}(n) = \begin{cases} n & \textbf{if } \texttt{used} = \texttt{n} \text{ // 🙁 case}\\ 1 & \textbf{otherwise} \text{ // 😃 case} \end{cases}$

$T_{append}(n)$ is $O(n)$ and $\Omega(1)$
(these bounds are also "tight", so no $\Theta$ -bound)

How often do we hit the 🙁 case?

newLength = data.size + 1

For $n$ appends into an empty buffer...

While $\texttt{used} \leq \texttt{INITIAL_SIZE}$ : $\sum_{i = 0}^{\texttt{IS}} \Theta(1)$

And after: $\sum_{i = \texttt{IS}+1}^{n} \Theta(i)$

Total for $n$ insertions: $\Theta(n^2)$

newLength = data.size + 10

For $n$ appends into an empty buffer...

While $\texttt{used} \leq \texttt{INITIAL_SIZE}$ : $\sum_{i = 0}^{\texttt{IS}} \Theta(1)$

And after: $\sum_{i = \texttt{IS}+1}^{n} \begin{cases} \Theta(i) & \textbf{if } i = \texttt{IS} \mod 10\\ \Theta(1) & \textbf{otherwise} \end{cases}$

newLength = data.size + 10

... or ... $\left(\sum_{i = \texttt{IS}+1}^{n} \Theta(1)\right) + \left(\sum_{j = 0}^{\frac{(n - \texttt{IS}+1)}{10}} \Theta((\texttt{IS}+1+j)\cdot 10) \right)$

Total for $n$ insertions: $\Theta(n^2)$

newLength = data.size × 2

For $n$ appends into an empty buffer...

While $\texttt{used} \leq \texttt{INITIAL_SIZE}$ : $\sum_{i = 0}^{\texttt{IS}} \Theta(1)$

And after... $\sum_{i = IS+1}^{n} \begin{cases} \Theta(i) & \textbf{if } i = \texttt{IS} \cdot 2^k \textbf{ (for any $k \in \mathbb N$)}\\ \Theta(1) & \textbf{otherwise} \end{cases}$

newLength = data.size × 2

How many boxes for $n$ inserts? $\Theta(\log(n))$

How much work for box $j$ ?

$\Theta(\texttt{IS} \cdot 2^j) + \sum_{1}^{\texttt{IS} \cdot 2^j}\Theta(1)$

$= \Theta(2^j)$

How much work for $n$ inserts? $\sum_{j = 0}^{\Theta(\log(n))}\Theta(2^j)$

Total for $n$ insertions: $\Theta(n)$

Amortized Runtime

append(elem) is $O(n)$

$n$ calls to append(elem) are $O(n)$

The cost of $n$ calls is guaranteed.
(It would be nice if we had a name for this...)

Amortized Runtime

If $n$ calls to a function take $O(T(n))$ ...

We say the Amortized Runtime is $O\left(\frac{T(n)}{n}\right)$

e.g., the amortized runtime of append is $O(\frac{n}{n}) = O(1)$

(even though the worst-case runtime is $O(n)$ )

Next time...

Linked Lists
Iterators
Access-by-reference

ArrayBuffers, Amortized Analysis

CSE-250 Fall 2022 - Section B

Textbook: Ch. 6.4