CSE1020 Lab 10

Part 2: Binary Search

Spell-Checking a Document

The following program should use a dictionary to spell check a document (in this case, the first chapter of Charles Dickens' novel A Christmas Carol). Copy-and-paste the program into your editor.

import java.util.*;
import java.io.*;

public class SpellCheck
{
   public static void main(String[] args) throws IOException
   {
      PrintStream out = System.out;
      
      if (args.length != 1)
      {
         out.println("Usage: java SpellCheck <list-type>");
         out.println("where <list-type> is linked or array");
         System.exit(1);
      }
      
      // create the dictionary list
      String listType = "ArrayList";
      List<String> dictionary = new ArrayList<String>();
      if (args[0].equals("linked"))
      {
         listType = "LinkedList";
         dictionary = new LinkedList<String>();
      }
      
      // read in the dictionary using a scanner
      final String DICT_NAME = "/cse/course/1020/wordsEn.txt";
      Scanner dictionaryInput = new Scanner(new File(DICT_NAME));
      while (dictionaryInput.hasNextLine())
      {
         String word = dictionaryInput.nextLine().trim();
         dictionary.add(word);
      }
      
      // read in the document
      Scanner textScanner = new Scanner(new File("/cse/course/1020/AChristmasCarol.txt"));      
      Set<String> document = new TreeSet<String>();
      while (textScanner.hasNext())
      {
         String text = textScanner.next();
         text = text.replaceAll("-", " ");
         // remove all punctuation
         text = text.replaceAll("[\".,();:?!]", "").trim();
         String[] words = text.split("\\s");
         for (String word : words)
         {
            if (!word.isEmpty())
            {
               document.add(word.toLowerCase());
            }
         }
      }
      
      long startTime = System.nanoTime();
      /*
         YOUR LOOP GOES HERE
      */
      long estimatedTime = System.nanoTime() - startTime;
      out.printf("Took %d ns to spellcheck the document using a %s%n%n",
                 estimatedTime, listType);
      
   }
}
 

Add a loop to the program (see comment in red) so that the program actually spell checks each word in the document. Do not use traversal of the dictionary to look up words; instead, look at the API for java.util.Collections and use the method binarySearch. Recall that traversal requires O(n) time to search for a word, whereas binary search requires O(log n) time.

Your program should generate the following output (notice that our dictionary does not include abbreviations, contractions, possessive forms, and some plurals); you should get the same output using a linked list (with a different amount of time, of course):

java SpellCheck array
'change is not in the dictionary
'em is not in the dictionary
'merry is not in the dictionary
abels is not in the dictionary
abrahams is not in the dictionary
arm's is not in the dictionary
banker's is not in the dictionary
belshazzars is not in the dictionary
blindman's is not in the dictionary
broadwise is not in the dictionary
can't is not in the dictionary
christmas' is not in the dictionary
clerk's is not in the dictionary
cornhill is not in the dictionary
couldn't is not in the dictionary
country's is not in the dictionary
day's is not in the dictionary
didn't is not in the dictionary
dunstan is not in the dictionary
ebenezer is not in the dictionary
ghost's is not in the dictionary
grocers' is not in the dictionary
hamlet's is not in the dictionary
i'd is not in the dictionary
i'll is not in the dictionary
ironmongery is not in the dictionary
it's is not in the dictionary
life's is not in the dictionary
man's is not in the dictionary
marley is not in the dictionary
marley's is not in the dictionary
mayor's is not in the dictionary
men's is not in the dictionary
merchant's is not in the dictionary
moment's is not in the dictionary
morrow's is not in the dictionary
neighbouring is not in the dictionary
o'clock is not in the dictionary
overflowings is not in the dictionary
paul's is not in the dictionary
people's is not in the dictionary
pharaoh's is not in the dictionary
poulterers' is not in the dictionary
prophet's is not in the dictionary
scrooge's is not in the dictionary
sheba is not in the dictionary
solemnised is not in the dictionary
son's is not in the dictionary
spectre's is not in the dictionary
spirit's is not in the dictionary
thank'ee is not in the dictionary
there's is not in the dictionary
vision's is not in the dictionary
wailings is not in the dictionary
what's is not in the dictionary
won't is not in the dictionary
wouldn't is not in the dictionary
years' is not in the dictionary
you'll is not in the dictionary
you're is not in the dictionary
Took 13006120 ns to spellcheck the document using a ArrayList
 

Is there any significant difference in speed between ArrayList and LinkedList for binary search?

Submit

Submit your program using the command:

submit 1020 L10 SpellCheck.java

And then continue to Part 3.