From Day 01:
Word games seem to be very popular...
The fundamental component of a word game is a dictionary. You need a dictionary to validate words entered by the user.
You can think of a dictionary as being one of:
Happily, List
, Set
, and SortedSet
are all sub-interfaces of Collection
and Collection
has a method named contains
that returns true
if
the collection contains a specified element. Implementing
a dictionary is as simple as creating a Collection
.
You can find open source lists of English words fairly easily on the Internet (here). I have concatenated several of the lists together to form a dictionary of common English words that you can download here. The dictionary file has one word per line.
create a collection for the dictionary open the dictionary file for each line of the file read the word add the word to the dictionary
import java.util.Collection; import java.util.ArrayList; import java.util.Scanner; import java.io.*; public class Dictionary { public static void main(String[] args) throws IOException { PrintStream output = System.out; Scanner input = new Scanner(System.in); Collection<String> dictionary = new ArrayList<String>(); final String DICT_NAME = "dictionary.txt"; Scanner dictionaryInput = new Scanner(new File(DICT_NAME)); while (dictionaryInput.hasNextLine()) { String word = dictionaryInput.nextLine().trim(); dictionary.add(word); } output.printf("Dictionary has %d words%n", dictionary.size()); } }
Looking up a word in the dictionary is as easy as invoking
the contains
method from the Collection
interface:
prompt the user for a word while there is still input read the word if the word is in the dictionary output "in the dictionary" else output "not in the dictionary" prompt the user for a word
output.print("Enter a word to look up: ");
while (input.hasNextLine())
{
String word = input.nextLine().trim().toLowerCase();
if (dictionary.contains(word))
{
output.printf("%s is in the dictionary%n%n", word);
}
else
{
output.printf("%s is not in the dictionary%n%n", word);
}
output.print("Enter a word to look up: ");
}
See Dictionary.java.
contains
As a client, you should not care how a method is implemented.
However, you might speculate that contains
must
search the entire List
to determine if an element
is in the collection. In fact, the ArrayList
version of contains
has complexity O(N).
get
and set
have O(1) complexity
for ArrayList
.
add
to the end of the list has O(1) amortized
complexity for ArrayList
(adding N elements
to the end of a list has complexity O(N)).
add
to a particular location inside the list
has O(N) complexity (because elements need to be moved
down the list to make room for the new element).
We can avoid the O(N) search for a word in the dictionary
by sorting the list and using Collections.binarySearch
.
After reading in the list of words:
Collections.sort(dictionary);
To look up a word:
if (Collections.binarySearch(dictionary, word) >= 0) { output.printf("%s is in the dictionary%n%n", word); } else { output.printf("%s is not in the dictionary%n%n", word); }
Set
From the Java Tutorial:
A Set
is a Collection
that cannot
contain duplicate elements.
It models the mathematical set abstraction.
The Set
interface contains only methods
inherited from Collection
and adds the
restriction that duplicate elements are prohibited.
Set
public interface Set<E> extends Collection<E> { // Basic operations int size(); boolean isEmpty(); boolean contains(Object element); boolean add(E element); //optional boolean remove(Object element); //optional Iterator<E> iterator(); // Bulk operations boolean containsAll(Collection<?> c); boolean addAll(Collection<? extends E> c); //optional boolean removeAll(Collection<?> c); //optional boolean retainAll(Collection<?> c); //optional void clear(); //optional // Array Operations Object[] toArray(); <T> T[] toArray(T[] a); }
Set
Notice that Set
has no get
method.
You must use iteration (traversal) to retrieve elements
from a Set
.
Set
HashSet
add
, remove
, contains
TreeSet
add
, remove
, contains
Comparable
Because our dictionary was declared to be of type
Collection<String>
we can simply change
the actual type to any Set
:
Collection<String> dictionary = new HashSet<String>();
or
Collection<String> dictionary = new TreeSet<String>();
We can write a single program to test all of the collections we are interested in; the program asks the user for a choice of collection and we create a collection of appropriate type:
Collection<String> dictionary = null; // Type of collection output.println("Choose a collection"); output.println("1. ArrayList"); output.println("2. LinkedList"); output.println("3. TreeSet"); output.println("4. HashSet"); int collectionType = input.nextInt(); if (collectionType == 1) { dictionary = new ArrayList<String>(); } else if (collectionType == 2) { dictionary = new LinkedList<String>(); } else if (collectionType == 3) { dictionary = new TreeSet<String>(); } else { dictionary = new HashSet<String>(); }
See DictionaryAny.java.
Number of words searched for |
Relative average time | ||||
---|---|---|---|---|---|
ArrayList | LinkedList | ArrayList with binary search |
TreeSet | HashSet | |
100 | 1,817 | 2,891 | 7 | 6 | 1 |
200 | 3,554 | 5,803 | 13 | 12 | 2 |
400 | 7,062 | 11,344 | 24 | 26 | 4 |
800 | 14,597 | 22,209 | 46 | 49 | 8 |
1,600 | 28,004 | 43,998 | 94 | 96 | 15 |
ArrayList
has
similar search efficiency to a sorted set
Suppose you have a Collection
and you want to find all of the
elements that occur one or more times. You can use a Set
:
// cseAcct is a Collection of CSE accounts // Collection<String> cseAcct = ... Set<String> uniqueAcct = new HashSet<String>(cseAcct);
If you want the elements to be sorted, use a TreeSet
instead:
Set<String> uniqueAcct = new TreeSet<String>(cseAcct);
In both examples, a constructor is used that creates a set and populates it using the elements from the collection.
Suppose you have a Collection
and you want to find all of the duplicated
elements. You can use a pair of Set
s:
// cseAcct is a Collection of CSE accounts // Collection<String> cseAcct = ... Set<String> uniqueAcct = new HashSet<String>(); Set<String> duplicateAcct = new HashSet<String>(); for (String acct : cseAcct) { if (!uniqueAcct.add(acct)) { duplicateAcct.add(acct); } }
See Duplicates.java
Suppose you have a set containing the accounts of students who completed assignment 1 and a second set containing the accounts of students who completed assignment 2. To find the set of students who completed either assignments, you compute the union of the two sets.
s1.addAll(s2)
: transforms s1
into the union of
s1
and s2
.
// s1 contains the accounts of students who completed assignment 1 // s2 contains the accounts of students who completed assignment 2 Set<String> eitherAssignment = new HashSet<String>(s1); eitherAssignment.retainAll(s2);
Suppose you have a set containing the accounts of students who completed assignment 1 and a second set containing the accounts of students who completed assignment 2. To find the set of students who completed both assignments, you compute the intersection of the two sets.
s1.retainAll(s2)
: transforms s1
into the intersection of
s1
and s2
.
// s1 contains the accounts of students who completed assignment 1 // s2 contains the accounts of students who completed assignment 2 Set<String> bothAssignments = new HashSet<String>(s1); bothAssignments.retainAll(s2);
Suppose you have a set containing the accounts of students who completed assignment 1 and a second set containing the accounts of students who completed assignment 2. To find the set of students who completed only assignment 1, you compute the set difference:
s1.removeAll(s2)
: transforms s1
into the set difference of
s1
and s2
.
// s1 contains the accounts of students who completed assignment 1 // s2 contains the accounts of students who completed assignment 2 Set<String> onlyAssignment1 = new HashSet<String>(s1); onlyAssignment1.removeAll(s2);