Answer
For this week's lab, you will use two of the classes in the Java Collection Framework: and . You will use these classes to implement a spell checker.
this lab, you will need to use some of the methods that are defined in the Set interface. Recall that if set is a Set, then the following methods are defined:
-- Returns the number of items in the set.
-- Adds the item to the set, if it is not already there.
-- Check whether the set contains the item.-- Check whether the set is empty.
You will also need to be able to traverse a set, using either an iterator or a for-each loop.
Reading a Dictionary
The file words.txt (in the code directory) contains a list of English words, with one word on each line. You will look up words in this list to check whether they are correctly spelled. To make the list easy to use, you can store the words in a set. Since there is no need to have the words stored in order, you can use a HashSet for maximum efficiency.
Use a Scanner to read the file. You can create scanner, filein, for reading from a file with a statement such as:
filein = new Scanner
(new File("/classes/s09/cs225/words.txt"));
and that a file can be processed, token by token, in a loop such as:
while (filein.hasNext()) {
String tk = filein.next();
process(tk); // do something with the token
}
(For the wordlist file, a token is simply a word.)
Start your main program by reading the words from words.txt and storing them in a HashSet. For the purposes of this program, convert all words to lower case before putting them in the set. To make sure that you've read all the words, check the size of the set. (It should be 72875.) You could also use the contains method to check for the presence of some common word in the set.
Checking the Words in a File
Once you have the list of words in a set, it's easy to read the words from a file and check whether each word is in the set. Start by letting the user select a file. You can either let the user type the name of the file or you can use the following metho
}
Use a Scanner to read the words from the selected file. In order to skip over any non-letter characters in the file, you can use the following command just after creating the scanner (where in is the variable name f
(In this statement, "[^a-zA-Z]+" is a regular expression that matches any sequence of one or more non-letter characters. This essentially makes the scanner treat any non-letter the way it would ordinarily treat a space.)
You can then go through the file, read each word (converting it to lower case) and check whether the set contains the word. At this point, just print out any word that you find that is not in the dictionary.
Providing a List of Possible Correct Spellings
A spell checker shouldn't just tell you what words are misspelled -- it should also give you a list of possible correct spellings for that word. Write method
static TreeSet corrections(String badWord, HashSet dictionary)
that creates and returns a TreeSet containing variations on badWord that are contained in the dictionary. In your main program, when you find a word that is not in the set of legal words, pass that word to this method (along with the set). Take the return value and output any words that it contains; these are the suggested correct spellings of the misspelled word. Here, for example, is part of the output from a sample program when it was run with the HTML source of this page as inp
Explanation: