Wednesday, June 3, 2020

Java LinkedList vs ArrayList


Last week I have interviewed a Senior Java Developer, having more than 10 years of experience in the Java domain.
I wanted to ask some "warm-up" questions to break the ice in the beginning of the interview, so I've asked about the differences between LinkedList and ArrayList. 
To my surprise, he had failed providing answers.
If you claim to be a Java expert, you must know the answer to this question.


What is a LinkedList?


A LinkedList, as its name implies, is a list based on list of elements.
Java keep a doubly linked list, which means each element keeps a pointer to its previous element, and its next element.




The upside of using a LinkedList is that we can, with relatively low cost, update it.

For example, adding an element in the middle of the list is simple:
  • Create a new node, and update its pointer to the next element.
  • Update the previous element pointer to point the the new node.

There are several downsides for using a LinkedList.

First, the memory footprint is higher, as we need to keep two pointers per each list element (for the previous element, and for the next element). 
Each pointer required additional 4 bytes or 8 bytes on 32 bits JVM or 64 bits JVM.
So in most cases this means 2 X 8 bytes = 16 additional bytes per each element.

Notice that each element is actually a java object:  LinkedList.Node
This means also overhead on the garbage collector per the size of the list.


Second, to get the Nth element, we need to scan all elements until the Nth position. 
This means that data access to the Nth element has cost of O(N).


What is an ArrayList?


An ArrayList is a list that internally uses a dynamic sized array to store the elements.
The dynamic array size is automatically increased when its capacity is reached.
The related size increase is based on this formula (simplified version, see the actual JRE source code for full code):

int newCapacity = oldCapacity + (oldCapacity >> 1);

This means we'll expand the size by 50% upon capacity reach.




There are several upsides for using an ArrayList.

First is the memory usage, which is almost identical to the actual elements size.
Notice that we actually keep an array of pointers, so we do have a pointer to keep, but as we must keep a pointer to access an object, we can ignore this overhead.

Second is the access to the Nth element, which is using a direct access by:
  array start location + N*pointer size

The downside of ArrayList is updates.
To add or remove element from an ArrayList, we need to shift all elements after the added elements.

In addition, add of a new element, would sometime cause inflation of the array, which means, allocation of a new array, and copy of the elements, but in average, over many add operations, this is still O(1).


Summary


We should use ArrayList when:
  • We have a lot of elements, and we want to reduce memory footprint
  • We do not update the list, but only adding elements at the end of the list
We should use LinkedList when:
  • We do not have many elements
  • We update the list a lot


  LinkedList  ArrayList 
 add O(1) O(1) on average
 remove and add in specific index O(1) O(N)
 get specific index O(N) O(1)


Some Performance Tests


I wanted to verify some of the statement made before, and I've run the following test code:


package com.alon.listing;

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.Random;

public class ListPerf {
private static final int MAX_SIZE = 1_000_000;
private final List<Integer> list;

public ListPerf(List<Integer> list) {
this.list = list;
}

public void stressTests() {
addTests();
fetchTests();
deleteTests();
deleteFirstTests();
}

private void timerRun(int times, String description, Runnable runnable) {
long start = System.currentTimeMillis();
for (int i = 0; i < times; i++) {
runnable.run();
}
float singleTime = System.currentTimeMillis() - start;
singleTime = singleTime * 1000 / times;
System.out.println(description + " timing " + String.format("%.3fms", singleTime));
}

private void addTests() {
Random random = new Random();
timerRun(1_000_000, "add", () -> {
list.add(random.nextInt());
});
}

private void fetchTests() {
Random random = new Random();
timerRun(1000, "fetch", () -> {
list.get(random.nextInt(list.size()));
});
}

private void deleteTests() {
Random random = new Random();
timerRun(1000, "delete", () -> {
list.remove(random.nextInt(list.size()));
});
}

private void deleteFirstTests() {
timerRun(1000, "delete first", () -> {
list.remove(0);
});
}

public static void main(String[] args) {
System.out.println("=== LinkedList tests ===");
new ListPerf(new LinkedList<>()).stressTests();

System.out.println("=== ArrayList tests ===");
new ListPerf(new ArrayList<>()).stressTests();
}
}


and get the following results:


=== LinkedList tests ===
add timing 0.173ms
fetch timing 1534.000ms
delete timing 1317.000ms
delete first timing 0.000ms
=== ArrayList tests ===
add timing 0.060ms
fetch timing 0.000ms
delete timing 133.000ms
delete first timing 261.000ms


Notice that the delete test has better results on the ArrayList, which is opposite than the expected result.
The reason is that removal of an Nth element requires first to locate the element, hence the cost is higher on the LinkedList.

When removing the first element, there is no need to scan the list, and hence the LinkedList performance is much better.





1 comment: