Class WordFreq
java.lang.Object
WordFreq
public class WordFreq
- extends java.lang.Object
This program creates a list of words and word frequencies
from an input text file.
Output is written to the console.
Invocation:
PROMPT>java WordFreq file [-w] [-f] [-v] [-s] [-x] [-l]
where file = input text file
options: (default is no output)
-w = output words
-f = output frequencies
-v = verbose option
-s = sort 'by word' (default is 'by count')
-x = exclude words with characters other than a-z or A-Z
-l = convert all characters to lowercase
The input text is tokenized
using Java's StringTokenizer
class with "\n\t\" .,-?;:()!"
as the delimiter string. Note the omission of the single quote character
as a deliminter. As an example, the text
Free-range chickens are the best, don't you think?
is broken into the following tokens:
Free
range
chickens
are
the
best
don't
you
think
Here are some example invocations:
PROMPT>java WordFreq phrases2.txt -w -f -v
the 189
a 108
is 85
to 57
of 54
you 49
...
yes 1
yet 1
young 1
zoom 1
total words: 2713
unique words: 1164
non-words: 0
PROMPT>java WordFreq GreatExpectations.txt -v -x
total words: 183958
unique words: 11509
non-words: 2735
PROMPT>type temp.txt
Hello World
hello world
PROMPT>java WordFreq temp.txt -w -f
Hello 1
World 1
hello 1
world 1
PROMPT>java WordFreq temp.txt -w -f -l
hello 2
world 2
- Author:
- Scott MacKenzie, 2001-2006
Method Summary |
static void |
main(java.lang.String[] args)
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WordFreq
public WordFreq()
main
public static void main(java.lang.String[] args)
throws java.io.IOException
- Throws:
java.io.IOException