Set
is a
Collection
that cannot contain duplicate elements. It models the mathematical set
abstraction. The Set
interface contains only methods
inherited from Collection
and adds the restriction that
duplicate elements are prohibited. Set
also adds a stronger
contract on the behavior of the equals
and hashCode
operations, allowing Set
instances to be compared meaningfully
even if their implementation types differ. Two Set
instances are
equal if they contain the same elements.
The following is the Set
interface.
public interface Set<E> extends Collection<E> { // Basic operations int size(); boolean isEmpty(); boolean contains(Object element); boolean add(E element); //optional boolean remove(Object element); //optional Iterator<E> iterator(); // Bulk operations boolean containsAll(Collection<?> c); boolean addAll(Collection<? extends E> c); //optional boolean removeAll(Collection<?> c); //optional boolean retainAll(Collection<?> c); //optional void clear(); //optional // Array Operations Object[] toArray(); <T> T[] toArray(T[] a); }
Set
implementations:
HashSet
, TreeSet
, and LinkedHashSet
.
HashSet
, which stores its elements in a hash table, is the best-performing
implementation; however it makes no guarantees concerning the order of iteration.
TreeSet
, which stores its elements in a red-black tree,
orders its elements based on their values; it is substantially slower than
HashSet
.
LinkedHashSet
, which is implemented as
a hash table with a linked list running through it, orders its elements based
on the order in which they were inserted into the set (insertion-order).
LinkedHashSet
spares its clients from the unspecified, generally
chaotic ordering provided by HashSet
at a cost that is only
slightly higher.
Here's a simple but useful Set
idiom. Suppose you have a
Collection
, c
, and you want to create another
Collection
containing the same elements but with all
duplicates eliminated. The following one-liner does the trick.
Collection<Type> noDups = new HashSet<Type>(c);
Set
(which, by definition, cannot contain a
duplicate), initially containing all the elements in c
. It uses
the standard conversion constructor described in the
The Collection Interface section.
Here is a minor variant of this idiom that preserves the order of the original collection while removing duplicate element.
Collection<Type> noDups = new LinkedHashSet<Type>(c);
The following is a generic method that encapsulates the preceding idiom, returning a Set
of the same generic type as the one passed.
public static <E> Set<E> removeDups(Collection<E> c) { return new LinkedHashSet<E>(c); }
size
operation returns the number of elements in the
Set
(its cardinality). The isEmpty
method
does exactly what you think it would. The add
method adds the
specified element to the Set
if it's not already present and
returns a boolean indicating whether the element was added.
Similarly, the remove
method removes the specified element from
the Set
if it's present and returns a boolean indicating whether
the element was present. The iterator
method returns an
Iterator
over the Set
.
The following
program
takes the
words in its argument list and prints out any duplicate words, the number of
distinct words, and a list of the words with duplicates eliminated.
import java.util.*; public class FindDups { public static void main(String[] args) { Set<String> s = new HashSet<String>(); for (String a : args) if (!s.add(a)) System.out.println("Duplicate detected: " + a); System.out.println(s.size() + " distinct words: " + s); } }
java FindDups i came i saw i left
Duplicate detected: i Duplicate detected: i 4 distinct words: [i, left, saw, came]
Collection
by its interface
type (Set
) rather than by its implementation type
(HashSet
). This is a strongly recommended programming
practice because it gives you the flexibility to change implementations merely by
changing the constructor. If either of the variables used
to store a collection or the parameters used to pass it around are declared
to be of the Collection
's implementation type rather than its interface type, all such variables and parameters must be changed in order to change its implementation type.
Furthermore, there's no guarantee that the resulting program will work. If the program uses any nonstandard operations present in the original implementation type but not in the new one, the program will fail. Referring to collections only by their interface prevents you from using any nonstandard operations.
The implementation type of the Set
in the preceding example is
HashSet
, which makes no guarantees as to the order of the elements
in the Set
. If you want the program to print the word list in
alphabetical order, merely change the Set
's implementation
type from HashSet
to TreeSet
. Making this trivial
one-line change causes the command line in the previous example to
generate the following output.
java FindDups i came i saw i left Duplicate detected: i Duplicate detected: i 4 distinct words: [came, i, left, saw]
Set
s;
when applied, they perform standard set-algebraic operations. Suppose
s1
and s2
are sets. Here's what bulk
operations do:
s1.containsAll(s2)
— returns true
if
s2
is a subset of s1
.
(s2
is a subset
of s1
if set s1
contains all of the elements in
s2
.)
s1.addAll(s2)
— transforms s1
into the
union of s1
and s2
. (The union of two sets
is the set containing all of the elements contained in either set.)
s1.retainAll(s2)
— transforms s1
into the
intersection of s1
and s2
. (The intersection
of two sets is the set containing only the elements common to both
sets.)
s1.removeAll(s2)
— transforms s1
into the
(asymmetric) set difference of s1
and s2
.
(For example, the set difference of s1
minus s2
is the set containing
all of the elements found in s1
but not in s2
.)
Set<Type> union = new HashSet<Type>(s1); union.addAll(s2); Set<Type> intersection = new HashSet<Type>(s1); intersection.retainAll(s2); Set<Type> difference = new HashSet<Type>(s1); difference.removeAll(s2);
Set
in the preceding
idioms is HashSet
, which is, as already mentioned,
the best all-around Set
implementation in the Java platform.
However, any general-purpose
Set
implementation could be substituted.
Let's revisit the FindDups
program. Suppose you
want to know which words in the argument list occur only once and which
occur more than once, but you do not want any duplicates printed out
repeatedly. This effect can be achieved by generating two sets —
one containing every word in the argument list and the other containing only
the duplicates. The words that occur only once are the set
difference of these two sets, which we know how to compute. Here's how
the resulting program
looks.
import java.util.*; public class FindDups2 { public static void main(String[] args) { Set<String> uniques = new HashSet<String>(); Set<String> dups = new HashSet<String>(); for (String a : args) if (!uniques.add(a)) dups.add(a); // Destructive set-difference uniques.removeAll(dups); System.out.println("Unique words: " + uniques); System.out.println("Duplicate words: " + dups); } }
i came i saw i left
),
the program yields the following output.
Unique words: [left, saw, came] Duplicate words: [i]
Set<Type> symmetricDiff = new HashSet<Type>(s1); symmetricDiff.addAll(s2); Set<Type> tmp = new HashSet<Type>(s1); tmp.retainAll(s2)); symmetricDiff.removeAll(tmp);
Set
s
beyond what they do for any other Collection
.
These operations are described in
The Collection Interface section.