Lecture 4:Data Structure

Written byKrung Sinapiromsaran
July 2557



Box - Page - Convex - Zoom - Linear - Fade - None - Default

Outline

  • Relationship between algorithms and data structures
  • Physical memory
  • Concrete versus Abstract data structure
  • Contiguous versus Linked
  • Dynamic arrays – amortized analysis
  • Abstract Data Type (ADT)
  • Linear abstract data types: lists, stacks, queues, deques
  • Dictionaries
  • Linear implementations of dictionaries

Objective

  • Explain the relationship between algorithm and data structures
  • Distinguish between concrete data structure and abstract data structure
  • Analyze and explain contiguous and linked structure to implement ADT such as Stack, Queue, Deque
  • Analyze the dictionary operations based on different implementation

Relationship between algorithm and data structure

  • Our algorithms will operate on data. We need a way to store this data.
  • We want to be able to perform abstract operations on this data:
    • adding a student to an enrollment database
    • searching for a student with a certain name
    • listing all students taking a certain module
  • Data structures are like the building blocks of algorithms
  • Using abstract structures such as sets, lists, dictionaries, trees, graphs etc. let us think algorithmically at a more abstract level
  • But, using a poor choice of data structure or a poor choice of implementation of a data structure can make your algorithm asymptotically worse

Relationship between algorithm and data structure

  • Implementations of abstract data structures are now included in standard libraries of almost every programming language
  • So you may well think:
    “I’m never going to have to implement any of these concepts, why should I care about data structures?”
    Answer part 1: This is good. Reinventing the wheel is pointless, such libraries will save you time.
    Answer part 2: If you don’t know how the data structure is implemented, you won’t know the efficiency of different operations – this can drastically affect the running time of your algorithms
  • Understanding the mechanics of data structures is crucial to understanding algorithm efficiency and becoming a good designer of new algorithms

Physical memory

Fact:Store data structures in the memory of a computer
What does the memory of our computer look like?

Down arrow
  • Organised into banks, rows, columns etc.
  • We supply a bank number, row number etc (= an address), memory returns us the contents Address → contents

Physical memory

Down arrow

Concrete versus Abstract data structure

  • We therefore have two levels of thinking about data structures:
    Concrete: concerned with addresses in physical memory
    Abstract: concerned only with abstract operations supported
  • Example:
    Concrete: arrays, linked lists
    Abstract: sets, lists, dictionaries, trees, graphs
  • But our implementations of abstractions must be in terms of the concrete structures with which our computer operates

Contiguous versus Linked

  • We can subdivide concrete data structures into two classes:
  • Contiguous: Composed of a single block of memory
  • Linked: Composed of multiple distinct chunks of memory together by pointers

Contiguous data structures -- Arrays and Record

Down arrow

Contiguous data structures -- Arrays and Record

Down arrow

Contiguous data structures -- Arrays and Record

Down arrow

Contiguous data structures -- Arrays and Record

Down arrow

Benefits of using contiguous array structures

  • We can retrieve an array element from its index in constant time, O(1), meaning it costs us asymptotically nothing to look up a record – this is a really big deal
  • Consist solely of data, no space wasted on links
  • Physical continuity/memory locality: if we look up element $i$, there is a high probability we will look up element $i+1$ next – this is exploited by cache memory in modern computer architectures

Drawbacks of using contiguous array structures

  • Inflexible: we have to decide in advance how much space we want when the array is allocated
  • Once the block of memory for the array has been allocated, that’s it – we’re stuck with the size we’ve got
  • If we try to write past the end of the array (overflow), we’ll be intruding on memory allocated for something else causing a segmentation fault
  • We can compensate by always allocating arrays larger than we think we’ll need, but this wastes a lot of space
  • Inflexible: think about removing or inserting sequences of records in the middle of an array

Dynamic arrays

  • A potential way around the problem of having to decide array size in advance is to use the dynamic arrays
  • We could start with an array of size 1
  • Each time we run out of space (i.e. want to write to index $m+1$ in an array of size $m$) we find a block of free memory, allocate a new array increasing the array size from $m$ to $2 m$ and copy all the contents across
  • Q: If we currently have $n$ items in our dynamic array, how many doubling operations will we have executed so far?
  • A: $⌈\log_2 n⌉$
  • The expensive part is copying every element into the new larger array when we have to resize
  • Q: How expensive is this?
  • A: Linear: $O(n)$

Dynamic arrays

  • The trickier question to answer is this
  • Q: What is the worst case complexity of inserting into a dynamic array?
  • A: It depends on whether we’ve filled up the array or not:
    Not full: Just insert the element = O(1)
    Full: Allocate new array, copy everything across, add new element = $O(n)$
  • We can’t give a definitive answer on the worst case complexity – it depends!

Dynamic Arrays

Let's imagine we've just copied our data to a larger array:

Down arrow
  • We can now make $n$ insertions at cost O(1) before we have to do anymore copying
  • The $n+1^{th}$ insertion will cost us $2n$ = $O(n)$
  • Total work for $n$ insertions is $3 n$.
  • $n$ insertions into a dynamic array is complexity $O(n)$
  • $n$ insertions into our standard array is also complexity $O(n)$ ...

Amortized analysis

  • This sort of analysis is called amortized analysis
  • Meaning: average cost of an operation over a sequence of operations
  • Different to average-case analysis (which is averaging over probability distribution of possible inputs)
  • Key idea of dynamic arrays: insertions will “usually” be fast, accessing elements will always be O(1)
  • In Big Oh terms, a dynamic array is no more inefficient than a standard array

Linked Structures

  • Alternative to contiguous structures are linked structures E.g. a linked list:
Down arrow

Linked Structures

  • Alternative to contiguous structures are linked structures E.g. a linked list:
Down arrow

Linked Structures

  • Alternative to contiguous structures are linked structures E.g. a linked list:
Down arrow

Linked Structures

  • Alternative to contiguous structures are linked structures E.g. a linked list:
Down arrow

Schematic representation of linked structures

Down arrow

Schematic representation of linked structures

Down arrow
Alternative:Keep a pointer to the item before as well as after as Doubly linked list: Down arrow

Benefits of using linked list structures

  • We don’t need to worry about allocating space in advance, can use any free blocks of space in memory
  • We only run out of space when the whole memory is actually full
  • Very flexible: think about adding sublists or deleting items
  • More efficient for moving large records (leave data in same place in memory, just change some pointers)

Drawbacks of using linked list structures

  • Wasted space: we’re storing both pointers and data.
  • To find the $p^{th}$ item, we must start at the beginning and follow pointers until we get there.
  • In the worst case, if there are $n$ items in a list and we want the last one, we have to do $n$ lookups
  • So retrieving an element from its position in the list is $O(n)$.
  • This is a real problem.

Abstract Data Type

  • We’ve seen concrete data structures which dealt with arranging data in memory
  • Abstract Data Types offer a higher level view of our interactions with data
  • Comprised of: (1) Data (2) Operations that allow us to interact with this data
  • We describe the behaviour of our data structures in terms of abstract operations
  • We can therefore use them without thinking:
    “Add this item to this list, I don’t care how you do it or how you are storing the list”

Abstract Data Type

  • However, the way these operations are implemented will affect efficiency.
  • There are different implementations of the same abstract operations.
  • We want the ones we will use most commonly to be the most efficient.
  • We will look briefly at 3 ADTs today: stacks, queues and dictionaries

Stacks (Last-In First-Out:LIFO)

Down arrow
Operations:
Stacks crop up in recursive algorithms

Queues (Last-in First-out

Down arrow

Queues (Last-in First-out)

  • Operations:
  • Queues crop up when we want to process items in the order they arrived.
  • Later we will see that adding nodes of a tree to a stack or queue and then retrieving them results in different tree traversal strategies.

Deques

  • Operations:
  • More versatile variant of a queue
  • Short for double-ended queue, pronounced “deck”

Stacks and Queues Implemented as Arrays

  • Stacks as Arrays
  • We only need to keep track of length
  • Queues as Arrays
  • We keep track of front and back index
  • Exercise:think up similar instructions for list implementations

Stacks and Queues

  • All operations on stacks and queues are O(1), implemented as either arrays or linked lists
  • Poping an empty stack or dequeueing an empty queue is called underflow
  • Trying to add an item when the memory limit of the chosen implementation has been reached is called overflow

Dictionary

  • Perhaps the most important ADT is the dictionary
  • An element in a dictionary contains two parts:
  • A key – used to address an item
  • A datum – associated with the key
  • Keys are unique, the dictionary is a function from keys to data
  • Think of our standard notion of a dictionary: key = word, datum = definition
  • Dictionaries are of huge practical importance
  • Google search is effectively a dictionary which pairs keywords with websites

Dictionary Operations

  • Some common operations
  • Others might include size(D), modify(D,k,v), IsEmpty(D) and so on
  • Implementing dictionaries such that the above operations are efficient requires careful choice of ADT implementation
  • Q: What will the complexity of there operations be if we implement them with an array or a linked list? Will the data being sorted make a difference?

Dictionary Operations

Complexity of dictionary operations implemented with an array for an $n$ entry dictionary:

Dictionary OperationsUnsorted arraySorted array
Lookup(D, k)
Insert(D, k)
Delete(D, k)
IsPresent(D, k)

Dictionary Operations

Complexity of dictionary operations implemented with an array for an n entry dictionary:

Dictionary OperationsUnsorted arraySorted array
Lookup(D, k)O($n$)O($\log n$)
Insert(D, k)O(1)O($n$)
Delete(D, k)O($n$)O($n$)
IsPresent(D, k)O($n$)O($\log n$)
  • For a sorted array, we can use binary search to find an item
  • Q: Can you explain the difference in cost for insert and delete?
  • A: We have a higher cost maintaining the sorted list, when we insert or delete we have to shuffle up items above. In worst case this would be every entry

Dictionary Operations

Complexity of dictionary operations implemented with a linked list for an n entry dictionary:

Dictionary OperationsUnsorted arraySorted array
Lookup(D, k)
Insert(D, k)
Delete(D, k)
IsPresent(D, k)

Dictionary Operations

Complexity of dictionary operations implemented with a linked list for an n entry dictionary:

Dictionary OperationsUnsorted arraySorted array
Lookup(D, k)O($n$)O($n$)
Insert(D, k)O(1)O($n$)
Delete(D, k)O($n$)O($n$)
IsPresent(D, k)O($n$)O($n$)
  • We can no longer use binary search to locate an item in the sorted case
  • So we trade off the flexibility of a linked structure against reduced efficiency for lookup operations

Conclusion

  • We’ve seen the difference between concrete and abstract, linked and contiguous
  • We’ve seen some important examples of ADTs
  • Linear implementations of dictionaries aren’t very efficient
  • Using a sorted array makes dictionary lookups fast

Comments and Suggestions