Collections: HashSet & TreeSet
What are they & when would you want to use one over the other?
In Java, you may find yourself wanting to use a
Set rather than a
List. With a
List, you can specify the order of the elements within. To find an element at a unknown location, you would need to traverse the entire list. Suffice to say, this is not ideal. Instead, we have the option of foregoing the order of the elements in exchange for finding a matching element quicker. The ordering of the elements is whatever is best for that data structure internally. Enter the
HashSet and a
TreeSetare two examples of concrete implementations of a
Set. We’ll take a look at what they both are and when you may want to use one over the other.
One important thing to note with the
Set abstract collection type is that there are no duplicates. If you’re considering using a concrete implementation of
Set, such as a
HashSet or a
TreeSet, it is worth bearing this in mind.
To understand a
HashSet, it can help to first understand what a hash table is. If you already know what one is, feel free to skip or gloss over the following section.
Hash tables are excellent for finding elements quickly. A hash table calculates an integer, known as the ‘hash code’, for every object within it. This is done using the instance fields from the object such that different objects would produce a different hash code.
When you create your own class, you’re responsible for creating (or generating) your own hash code. One important thing to bear in mind: your implementation of
hashCode must work with
b ought to have the same hash code.
Back to HashSet
contains method of a
HashSet is quick: you can quickly determine whether your
HashSet includes a particular object.
If you use the
HashSet iterator, it may seem as if the elements are being visited in a random order. The reason for this is that elements could be stored across a hash table. If you want to use a
HashSet, you should be content to make this trade-off.
What if you wanted a sorted set? Enter
TreeSet. Irrespective of the order in which you add the elements into the
TreeSet, they will come out in sorted order. Let’s take a look at an example:
We get back the following:
Finland Nigeria Peru Spain
In other words, it’s sorted in alphabetical order.
TreeSet uses the tree data structure (a Red-Black tree). When a new element is added to a
TreeSet, it is added in the right place to ensure the
TreeSet always remains sorted. If you’re familiar with Red-Black trees, this should make sense. If not, take this on faith (or, better yet, look up Red-Black trees!).
Since the elements need to be ordered, you will have to provide a
Comparator if there is not one already for the type of object you want to store.
HashSet vs TreeSet
There are pros and cons to using a
TreeSet over a
HashSet. It takes a lot longer to add an element to a
TreeSet that it does inserting into a
HashSet. This is because you are paying for the added sorting overhead. If you want your elements to be sorted, a
TreeSet may be preferable.
It may be worth noting that there is a
TreeMap that is an implementation of a
Map if you need key-value pairs.
Photo by Ryan Hafey on Unsplash