# Set Collection: Algorithms and data structures for beginners

A set is a collection that implements the basic mathematical operations on sets: intersections *(intersection),* union (union), the difference (difference) and symmetric difference (symmetric difference). Each of the algorithms we will analyze in the appropriate section.

## Lots of

In mathematics, sets are collections of objects that have something in common. For example, we can declare a set of even positive integers:

[2, 4, 6, 8, 10, ...]

Or many odd positive integers:

[1, 3, 5, 7, 9, ...]

There are no common elements in these two sets. Let’s now look at the many divisors of the number 100:

[1, 2, 4, 5, 10, 20, 25, 50, 100]

Now we can find out which divisors of the number 100 are odd, simply by looking at the set of odd numbers and at the set of dividers and choosing those from the numbers that are present in both. We can also answer the question: “Which odd numbers are not divisors of a hundred?” Or “Which positive integers, even or odd numbers, are not divisors of a hundred?”.

At first glance, this does not seem very useful, but it is an artificial example. Suppose that we have a multitude of all the employees of an enterprise and a multitude of employees who have passed a monthly check. Then we can easily answer the question: “Which of the workers failed the test?”.

We can also add different sets and build a more complex query, for example: “Which of the full-time employees of the sales department who have a corporate credit card failed to complete a mandatory refresher course?”.

## Set class

The class `Set`

implements the interface `IEnumerable`

and accepts an argument of the type that is a successor `IComparable`

, since the algorithms need to check the elements for equality.

The elements of our set will be stored in an instance of the standard .NET class `List`

, but in practice, tree structures, such as a binary search tree, are usually used for storage. The choice of internal representation will affect the complexity of the algorithms for working with the set. For example, when using the list, the method `Contains`

will run in *O (n)* time, while the set implemented using the tree will work on average in *O (log n)* time.

In addition to the methods for working with the set, the class `Set`

also has a constructor that accepts `IEnumerable`

initial elements.

public class Set: IEnumerable where T: IComparable { private readonly List _items = new List (); public Set () { } public Set (IEnumerable items) { AddRange (items); } public void Add (T item); public void AddRange (IEnumerable items); public bool Remove (T item); public bool Contains (T item); public int Count { get; } public Set Union (Set other); public Set Intersection (Set other); public Set Difference (Set other); public Set SymmetricDifference (Set other); public IEnumerator GetEnumerator (); System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator (); }

### Add method

**Behavior:**Adds elements to the set. If the item is already present in the set, an exception is thrown`InvalidOperationException`

.**Difficulty:***O (n)*

When implementing the method, `Add`

it is necessary to decide whether we will resolve duplicate elements. For example, if we have a look:

[1, 2, 3, 4]

If the user tries to add the number 3, the result is:

[1, 2, 3, 3, 4]

In some cases this is permissible, but in our implementation we will not support duplicates. If you imagine, for example, a multitude of college students, it would be illogical to add the same student to it twice. In fact, an attempt to add an element that is already present is most likely a mistake, and our class implementation `Set`

perceives it that way.

The method `Add`

uses the method `Contains`

that will be discussed later.

public void Add (T item) { if (Contains (item)) { throw new InvalidOperationException ("Item already exists in Set"); } _items.Add (item); }

### AddRange method

**Behavior:**Adds multiple items to the set. If any of the added elements are present in the set, or there are duplicate elements in the added elements, an exception is thrown`InvalidOperationException`

.**Complexity:***O (m · n),*where*m*is the number of inserted elements and*n*is the number of elements of the set.

public void AddRange (IEnumerable items) { foreach (T item in items) { Add (item); } }

### Remove method

**Behavior:**Removes the specified item from the set and returns`true`

. If there is no item, returns`false`

.**Difficulty:***O (n)*

public bool Remove (T item) { return _items.Remove (item); }

### Contains method

**Behavior:**Returns`true`

if the set contains the specified element. Otherwise returns`false`

.**Difficulty:***O (n)*

public bool Contains (T item) { return _items.Contains (item); }

### Count method

**Behavior:**Returns the number of elements in the set, or 0 if the set is empty.**Difficulty:***O (1)*

public int Count { get { return _items.Count; } }

### GetEnumerator method

**Behavior:**Returns an iterator to iterate over the elements of a set.**Complexity:**Obtaining an iterator is*O (1)*, traversing the elements of a set is*O (n)*.

public IEnumerator GetEnumerator () { return _items.GetEnumerator (); } System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator () { return _items.GetEnumerator (); }

## Algorithms

### Union method

**Behavior:**Returns the set obtained by combining it with the specified one.**Complexity:***O (m · n)*, where*m*and*n*are the number of elements of the transmitted and current sets, respectively.

A union of sets is a set that contains elements that are present in at least one of the two.

For example, there are two sets (highlighted in red):

During a join operation, elements of both sets are selected. If an element is present in both sets, only one copy is taken. The union of the two sets is shown in yellow.

Such a diagram that clearly demonstrates operations on sets is called a Venn diagram.

Another example using sets of integers:

[1, 2, 3, 4] union [3, 4, 5, 6] = [1, 2, 3, 4, 5, 6]

public Set Union (Set other) { Set result = new Set (_items); foreach (T item in other._items) { if (! Contains (item)) { result.Add (item); } } return result; }

### Intersection method

**Behavior:**Returns the set obtained by intersecting it with the specified one.**Complexity:***O (m · n)*, where*m*and*n*are the number of elements of the transmitted and current sets, respectively.

The intersection of sets contains only those elements that are in both sets. Using the Venn diagram, the intersection can be represented as:

Or using whole numbers:

[1, 2, 3, 4] intersect [3, 4, 5, 6] = [3, 4]

public Set Intersection (Set other) { Set result = new Set (); foreach (T item in _items) { if (other._items.Contains (item)) { result.Add (item); } } return result; }

### Difference Method

**Behavior:**Returns a set that is the difference of the current with the specified.**Complexity:***O (m · n)*, where*m*and*n*are the number of elements of the transmitted and current sets, respectively.

The set difference is all the elements that are contained in one set (the volume in which the method is called), but not in the other (the volume that is passed by the argument). The Venn diagram for set difference will look like this:

Or using whole numbers:

[1, 2, 3, 4] difference [3, 4, 5, 6] = [1, 2]

public Set Difference (Set other) { Set result = new Set (_items); foreach (T item in other._items) { result.Remove (item); } return result; }

### Symmetric Difference Method

**Behavior:**Returns a set that is a symmetric current difference with the specified one.**Complexity:***O (m · n)*, where*m*and*n*are the number of elements of the transmitted and current sets, respectively.

The symmetric difference – these are all elements that are contained only in one of the considered sets. The Venn diagram for the symmetric difference will look like this:

Or using whole numbers:

[1, 2, 3, 4] symmetric difference [3, 4, 5, 6] = [1, 2, 5, 6]

You may have noticed that the symmetric difference is “the reverse intersection”. Given this, let’s try to find it using existing operations.

So, we want to get a set that contains all the elements of the two sets, except those in both. In other words, we want to get the difference of the union of two sets and their intersections.

Or, if we look at the steps:

[1, 2, 3, 4] union [3, 4, 5, 6] = [1, 2, 3, 4, 5, 6] [1, 2, 3, 4] intersection [3, 4, 5, 6] = [3, 4] [1, 2, 3, 4, 5, 6] set difference [3, 4] = [1, 2, 5, 6]

What gives us the desired results `[1, 2, 5, 6]`

.

public Set SymmetricDifference (Set other) { Set union = Union (other); Set intersection = Intersection (other); return union.Difference (intersection); }

### IsSubset method

You may be wondering why we did not add a method `IsSubset`

that checks whether one set is entirely contained in the other. For example:

[1, 2, 3] is subset [0, 1, 2, 3, 4, 5] = true

While:

[1, 2, 3] is subset [0, 1, 2] = false

The fact is that this test can be carried out using existing methods. For example:

[1, 2, 3] difference [0, 1, 2, 3, 4, 5] = []

The empty set as a result tells us that all elements of the first set are contained in the second, and therefore, the first is a subset of the second. Another way to check this is:

[1, 2, 3] intersection [0, 1, 2, 3, 4, 5] = [1, 2, 3]

If as a result we get a set with the same number of elements as the original, then it is a subset of the second.

In the general case, of course, a class `Set`

may have a method `IsSubset`

(which can be implemented more efficiently). However, it should be remembered that this is not some kind of new opportunity, but just another application of existing ones.

## To be continued

That’s all, and next time we move on to the final topic of this series of articles – sorting algorithms.