Collections: HashSet & TreeSet

What are they & when would you want to use one over the other?

Collections: HashSet & TreeSet

In Java, you may find yourself wanting to use a Set rather than a List. With a List, you can specify the order of the elements within. To find an element at a unknown location, you would need to traverse the entire list. Suffice to say, this is not ideal. Instead, we have the option of foregoing the order of the elements in exchange for finding a matching element quicker. The ordering of the elements is whatever is best for that data structure internally. Enter theSet.

A HashSet and a TreeSetare two examples of concrete implementations of a Set. We’ll take a look at what they both are and when you may want to use one over the other.

Set type

One important thing to note with the Set abstract collection type is that there are no duplicates. If you’re considering using a concrete implementation of Set, such as a HashSet or a TreeSet, it is worth bearing this in mind.

HashSet

To understand a HashSet, it can help to first understand what a hash table is. If you already know what one is, feel free to skip or gloss over the following section.

Hash tables

Hash tables are excellent for finding elements quickly. A hash table calculates an integer, known as the ‘hash code’, for every object within it. This is done using the instance fields from the object such that different objects would produce a different hash code.

When you create your own class, you’re responsible for creating (or generating) your own hash code. One important thing to bear in mind: your implementation of hashCode must work with equals(). If a.equals(b) then a and b ought to have the same hash code.

Back to HashSet

The contains method of a HashSet is quick: you can quickly determine whether your HashSet includes a particular object.

If you use the HashSet iterator, it may seem as if the elements are being visited in a random order. The reason for this is that elements could be stored across a hash table. If you want to use a HashSet, you should be content to make this trade-off.

TreeSet

What if you wanted a sorted set? Enter TreeSet. Irrespective of the order in which you add the elements into the TreeSet, they will come out in sorted order. Let’s take a look at an example:

TreeSet countries = new TreeSet<String>();
countries.add("Spain");
countries.add("Finland");
countries.add("Nigeria");
countries.add("Peru");

for (String country : countries) {
    System.out.println(country);
}
Example of a TreeSet

We get back the following:

Finland
Nigeria
Peru
Spain

In other words, it’s sorted in alphabetical order.

A TreeSet uses the tree data structure (a Red-Black tree). When a new element is added to a TreeSet, it is added in the right place to ensure the TreeSet always remains sorted. If you’re familiar with Red-Black trees, this should make sense. If not, take this on faith (or, better yet, look up Red-Black trees!).

Since the elements need to be ordered, you will have to provide a Comparator if there is not one already for the type of object you want to store.

HashSet vs TreeSet

There are pros and cons to using a TreeSet over a HashSet. It takes a lot longer to add an element to a TreeSet that it does inserting into a HashSet. This is because you are paying for the added sorting overhead. If you want your elements to be sorted, a TreeSet may be preferable.

It may be worth noting that there is a TreeMap that is an implementation of a Map if you need key-value pairs.

Photo by Ryan Hafey on Unsplash