Guide to the Set Data Structure
When it comes to the full list of data structures, the Set data structure is one of my favorite topics. Even though sets are in the category of abstract data structures, you'll discover that they're quite practical. And in this guide we'll walk through a number of ways that sets can be used in real world programs.
Guide Tasks
  • Read Tutorial

When it comes to the full list of data structures, the Set data structure is one of my favorite topics. Even though sets are in the category of abstract data structures, you'll discover that they're quite practical. And in this guide we'll walk through a number of ways that sets can be used in real world programs.

Set Data Structure

Set Definition and Rules

So what exactly is a Set? A set is a data storage tool that follows a number of rules:

  • From a storage mechanism perspective, sets do not have to be in a specific order.
  • Sets cannot have repeated values.
  • In many cases sets are not changed after created.

Goal of Using a Set

Now let's discuss what we can use sets for. Since sets are an abstract data type, this means that they can be implemented using:

  • Arrays
  • Hash tables
  • And any other data structure that can conform to its rules

Usually you will choose to use a set when your goal is to check for inclusion. For example, in this guide's case study we're going to build a program that uses sets to implement a phone number validation tool. And by inclusion I mean that the goal of working with a set is to check to see if a value exists. This is in contrast to a data structure, such as a binary search tree, that has the goal of finding an element in a data collection.

Set Data Structure Example

In this guide we're going to use Ruby and its Set class to illustrate how sets can be used in programs.

Basic Usage for the Set Data Structure

Let's start by creating a new set and storing it in a variable. As mentioned previously, we're going to create a small program that will manage phone numbers.

In this imaginary application we'll say that we have a team of sales reps around the country that turn in new leads each day. To ensure that the sales reps are bring honest, our program needs to ensure that they:

  • Don't try to submit duplicate leads.
  • And that they don't try to steal their colleague's leads.

The Set data structure will be perfect for this program and since phone numbers are unique we can use them as our main data point.

Set Data Structure Syntax

Let's start by creating sets for several employees.

If you're following along in Ruby, you need to start by importing the set library.

require 'set'

With the library imported we can create our sets:

bob = Set.new ["123-456-7890", "234-567-8901", "345-678-9012"]
alice = Set.new ["567-890-1234", "678-901-2345"]

Running these two commands will create two sets containing a total of 5 phone numbers for Bob and Alice.

Following the Rules

Now let's see if our sets follow the rules. We can attempt to add a duplicate phone number to Bob's set with the code:

bob.add("123-456-7890")

Running this code will output the following set:

{"123-456-7890", "234-567-8901", "345-678-9012"}

As you can see, no new items were added and Bob wasn't able to pad his lead numbers. So this part of the program is working perfectly.

An additional rule that our program has to follow is that no two sale's people can have duplicate leads. And this is where the Union functionality comes in.

Union

The union process in a set is the mechanism of joining sets together. This fits our requirement of ensuring that no two salespeople can share a lead quite nicely.

Let's add one of Bob's phone numbers to Alice's set. We can do this with the add method in Ruby:

alice.add("123-456-7890")

This will update Alice's set with the new value, as shown here:

{"567-890-1234", "678-901-2345", "123-456-7890"}

Now our program will implement the union functionality in order to block any duplicates from being included in the same master set. The pipe (|) is the union operator in Ruby, so the code for combining the sets is:

bob | alice

Running this code will generate a combined set of:

{"123-456-7890", "234-567-8901", "345-678-9012", "567-890-1234", "678-901-2345"}

As you may notice, Alice's duplicate phone number has automatically been removed and the set follows the rules associated with sets. Namely that a set contains no duplicate values.

Intersection

Now the union process is great for automatically merging sets, but what happens if we want to know which items are duplicates? That's where the intersection process comes into play.

In Ruby the intersection method is called with the & character. So if we want to find the duplicates between sets we can run the following code:

bob & alice

And this will output the duplicate set shared between them:

{"123-456-7890"}

Set difference

Depending on your use case, there may be times when you want to know the values that are unique between sets. For our phone management program this will output the items that are unique between Bob and Alice. Ruby utilizes the - operator to find the difference.

The following code:

bob - alice

Will result in the output of:

{"234-567-8901", "345-678-9012"}

Summary

Sets are a great tool to have in your data structure arsenal. I enjoy working with the sets when the situation calls for it. But keep in mind that, just like any other data structure, a set is beneficial only when its used in situation that calls for its specific features.