Lecture 10

In this lecture we look at a few common data structures: ArrayList, LinkedList, Stack, Queue, HashMap and HashSet. They are all used to keep collections of data elements but the difference is in the order in which these elements are accessible and in the complexity of the different operations.

ArrayList

ArrayList represents an array of elements but unlike the plain arrays the length of the array list can grow dynamically. This lets us to add and remove elements without knowing in advance how many elements we will have at the end. Internally the array list is implemented with a plain array. The array list object actually has two instance variables: a reference to an array and an integer which indicates the length of the array list:

public class ArrayList {
    private int length;
    private Object[] content;
}

The allocated array is usually bigger than the actual length of the array list. When we create a new array list, its length variable is set to zero but the content variable is set to point to an array that has more than one element. When we add new elements to the array, then the length variable is incremented and the new element is added to the array. In this way the length variable indicates how many of the elements in the content variable are actually valid. The following illustrates a possible configuration of the array list:

When all elements in the content variable are filled in, then the implementation simply allocates a new bigger array. The old elements are copied to the new array and the elements that need to be added are appended in the new array. The advantage of the array list is that the access to an element with a random index takes a constant time, i.e. O(1), while still being able to extend the list with more elements.

The behaviour of the array list is not that good if we want to add or remove elements from the middle of the list. Both adding and removing elements from the end is efficient since this is reduced to an increment or decrement of the length variable. However, if we want to remove an element from the middle then in the worst case we have to move all elements in the list. Consider the following picture which demostrates what happens if the first element in the list must be removed:

In the beginning the list contains three elements but if we want to remove the first element then we have to move the second and the third element with one step to the left.

Exactly the opposite happens if we want to insert an element in the beginning:

We will have to move all existing elements with one step to the right in order to make room for the new first element.

The conclusion is that the insertion and the removal of random elements has complexity O(n) while the access to random elements remains constant O(1). The array list is a useful data structure when a random acceess is frequently needed while the insertions and deletions are rare.

LinkedList

The linked list is an alternative to the array list where the insertion and the removal of elements is always possible in a constant time but the price for this is that the access to a random element takes O(n) time. Still it is possible to iterate sequentially over the list in a linear time, i.e. the transition from one element to the next one takes only a constant time. The data structure is implemented with the class java.util.LinkedList.

In the linked list every element is represented with an object which contains a reference to the element and a reference to the object encapsulating the next element, i.e. the nodes of the list are of the following class:

public class Node {
   Object element;
   Node next;
}

Here the variable element refers to the value for the current node, while the variable next refers to the next node in the list. For example, if we have a list with the elements ["Kim", "Sam", "Tom"], then the list is represented with the chain of nodes:

It is not difficult to see that the iteration over the elements of list is possible in linear time. We need only a simple loop which goes through all elements:

Node node = list;
while (node != null) {
   System.out.println(node.element);
   node = node.next;
}

Accessing an element with a random index is still possible by iteration and counting. However, this means that accessing the element with index i will take O(i) time:

Node node = list;
while (node != null) {
   if (index == 0)
     return node.element;
   index--;
   node = node.next;
}

For comparison the access to a random element in an array list is possible in a constant time and the sequantial iteration over the elements is still linear.

The advantage of the linked list is that insertion of an element at any possition is possible in a constant time, if we have a reference to the node after which we want to do the insertion. We just need to manipulate the variable next:

node = new Node("Leo", prev.next);
prev.next = node;
What happens is easier to see on a picture:

In order to insert the new element "Leo" we create a new node which wraps the element. The variable next for the new node points to the node which follows the node after which we want to do the insertion. In this case this is the element "Sam". After that we update the variable next for the node "Kim" to point to the new node for "Leo".

Similarly we can do the removal of an element in a constant time. If we want to remove "Sam" from the list ["Kim", "Sam", "Tom"] then we just have to update the variable next from the node for "Kim" to point to the node for "Tom":

Complexity

Both the array list and the linked list represent one and the same abstract structure, i.e. a sequence of elements. The difference, however, is in the complexity of the different operations. The following table summarizes the comlexity of the different operations:
AccessInsertDeleteIteration
ArrayListO(1)O(n)O(n)O(n)
LinkedListO(n)O(1)O(1)O(n)

The choice of the data structure must be governed by the frequency of the different operations. If a fast access to a random element is needed then the array list is the preferred structure. For example the binary search algorithm requires access to the element in the middle of the array. If the data structure requires linear time instead of constant time access, then the binary search actually becomes slower than the linear search. Still in many algorithms we just need to sequentially iterate over the elements. In such cases both the array list and the linked list offer efficient access. The advantage of the linked list is that in addition it offers constant time insertion and removal.

Stacks and Queues

It is very often the case that we need a sequence of elements but we don't need to access them in an arbitrary order. Basically we just want to be able to add a new element and to be able to remove one element.

Depending on the order in which we add and remove elements there are two basic data structures: stack and queue. The stack is like a pile of books - we can only add a new book on the top and we can remove only the last book. Similarly with the queue we can add a new element only at the end of the queue but contrary to the stack we remove an element only from the front of the queue:

The stack is also known as a structure with LIFO (Last In - First Out) access. The queue is a FIFO (First In - First Out) structure.

The stack can be implemented efficiently both with an array list and with a linked list. The following table shows the method calls that are needed for implementing the three stack operations, i.e. creation of a new stack, pushing an element into the stack, and poping the element from the stack:
ArrayListLinkedList
newlist = new ArrayList();list = new LinkedList();
pushlist.add(element);list.push(element);
popelement = list.remove(list.size()-1);element = list.pop();

The efficient implementation for a queue requires that elements can be removed from the top of the queue in a constant time. This is possible only by using a linked list. The operations with a queue are: creation of a new queue, offering an element to the end of the queue, and polling the first element from the queue. The methods that the linked list offers are:
newlist = new LinkedList();
offerlist.offer(element);
pollelement = list.poll();

HashMap and HashSet

The HashSet is another kind of collection where it is guaranteed that the elements form a set. In other words duplicated elements are not allowed. The HashMap is similar but here there is a value associated to every element in the set. For example the following code creates a HashMap which associates a telephone number to every name:

HashMap<String,String> phonebook = new HashMap<String,String>();
phonebook.put("John", "1234");
phonebook.put("Mary", "7125");

Once the map is populated we can retrieve a phone number by name:

System.out.println(phonebook.get("John"));

Similarly it is also possible to remove an element by its key:

phonebook.remove("John");

The class HashSet has similar methods except that it contains only keys and not values:
boolean add(Object o)adds an element to the set
boolean remove(Object o)removes the element
boolean contains(Object o)returns true if the element is already in the set

The Java Collections

Since the different data structures has a lot of common, they are organized in a hierarchy by using interfaces. In the picture bellow the yellow color represents an interface and the green color represents a class:

By using the interface List for instance, one and the same code can be made to work both an array list and a linked list. After all the two data structures are equivalent on an abstract level. However, the different operations in the different structures have different complexity and this has to be taken into account when designing algorithms.

Exercises