|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectErrorMatrix
public class ErrorMatrix
ErrorMatrix - a class to store the counts of presented-transcribed characters in evaluations of text entry methods.
Related publication:
quickly qucehklythen set of optical alignments is
quic--kly qu-cehkly quic-kly qucehkly qui-ckly qucehkly qu-ickly qucehklyCounts are tallied for each occurrence of a c1-c2 character pair, where each count is the reciprocal of the alignment count for the particular presented/transcribed text string. Thus, for any phrase the counts are weighted by the number of alignments in which the c1-c2 entries occurred.
"c1" is a character in the top alignment ("presented text"). "c2" is a character in the bottom alignment ("transcribed text"). Categories of c1-c2 entries include correct entries (c1 = c2), insertion errors (c1 = "-"), substitution errors (c1 != c2), and deletion errors (c2 = "-"). Various summary statistics are retrievable via the instance methods.
The first c1-c2 pair in the example above is q-q. There are four alignments, so the first q-q pair is given a count of 1 / 4 = 0.25. However, q-q appears in each of the 4 alignments; so the weighted count for q-q is 4 x 0.25 = 1.0 for the presented-transcribed text string. Clearly, "q" was entered correctly.
Although the processing above seems convoluted, it accommodates the situation where an error occured and type of error is ambiguous. Consider the "i" in "quickly" above. By examining the transcribed text string, it is evident there was an error. But, what was the error? Since there are four alignments, there are four possible explanations. First, there may have been a deletion error, as seen in the top alignment with c2 = "-" (weight = 0.25). However, it is also possible there was a substitution error, as seen in the bottom three alignments. In these cases, we see two i-c substitutions (weight = 2 x 0.25 = 0.50) and one i-e substitution (weight = 0.25). In most text entry evaluations, it is not known which explanation is correct. The process decribed here accommodates this by weighting all explanations according to their presence in the alignments.
The output is a matrix of size n x (n + 1)
, where
n
is the number of characters in the
charDef
array. The charDef
array is assumed
to contain the set of characters that appear in the presented text.
The default charDef
array
contains 28 characters:
The "_" symbol represents the SPACE character. Rows represent the presented character, while columns represent the transcribed character. The dash ("-") at the end represents either a substitution error or a deletion error, depending on whether the array entries represent columns or rows in the matrix (see below)._ a b c d e f g h i j k l m n o p q r s t u v w x y z -
The organization of the matrix is illustrated as follows:
As noted above, the error matrix also holds the counts for correct
entries. These appear along the diagonal; that is, where the row index
( i ) equals the column index ( j ).
So,
getRowSum(i)
retrieves the total number of occurrences
of the character identified by index i
.
This includes the number of correct entries,
as well as the number of substitution and deletion errors.
If just the
number of errors is desired, use one of getRowSubCount(i)
,
getRowSubOtherCount(i)
, getRowDelCount(i)
, or
getRowTotCount(i)
.
For an insertion error, c1 = "-", thus, these errors do not appear
along the row for a particular character, but, instead along the bottom
row. The number of Insertion errors for
a character is retrieved using getCell(rows - 1, j)
where
j
is the index of the character.
The ErrorMatrix
class also includes a main
method, serving as an application to build error tables or error
matrices.
Example invocations:
PROMPT>java ErrorMatrix usage: java ErrorMatrix file [-et] [-em] [-a] [-nd] [-pr] [-co] where file = a file containing presented/transcribed strings -et = output error table -em = output error matrix -a = output alignments (use for debugging/demo) -nd = null diagonal cell entries in error matrix -pr = use probabilities instead of counts in error matrix -co = console output (looks better on display) (Note: default is no output) PROMPT>java ErrorMatrix ds2-phrases.txt -et -co Files: 10 Phrases: 673 MSD Error Rate = 2.2251% (mean across characters) MSD Error Rate = 2.1535% (mean across phrases) MSD Error Rate = 2.1102% (mean across files) Chr Count Ins Sub Del Total -------------------------------------------------------- _ 2956.0000 0.0000 0.0027 0.0185 0.0213 a 1248.0000 0.0000 0.0083 0.0089 0.0172 b 231.0000 0.0000 0.0412 0.0151 0.0563 c 457.0000 0.0000 0.0241 0.0153 0.0394 d 527.0000 0.0000 0.0063 0.0127 0.0190 e 2051.0000 0.0000 0.0060 0.0073 0.0133 f 335.0000 0.0000 0.0119 0.0030 0.0149 g 355.0000 0.0000 0.0087 0.0047 0.0134 h 723.0000 0.0000 0.0055 0.0083 0.0138 i 1154.0000 0.0000 0.0117 0.0082 0.0199 j 45.0000 0.0000 0.0081 0.0141 0.0222 k 178.0000 0.0000 0.0000 0.0056 0.0056 l 647.0000 0.0000 0.0104 0.0100 0.0205 m 387.0000 0.0000 0.0129 0.0103 0.0233 n 1005.0000 0.0000 0.0162 0.0112 0.0274 o 1356.0000 0.0000 0.0156 0.0095 0.0251 p 324.0000 0.0000 0.0093 0.0031 0.0123 q 35.0000 0.0000 0.0000 0.0000 0.0000 r 1058.0000 0.0000 0.0112 0.0082 0.0194 s 1056.0000 0.0000 0.0149 0.0123 0.0271 t 1431.0000 0.0000 0.0054 0.0100 0.0154 u 492.0000 0.0000 0.0129 0.0094 0.0224 v 234.0000 0.0000 0.0085 0.0085 0.0171 w 293.0000 0.0000 0.0205 0.0068 0.0273 x 52.0000 0.0000 0.0000 0.0000 0.0000 y 408.0000 0.0000 0.0025 0.0025 0.0049 z 21.0000 0.0000 0.0635 0.0317 0.0952 - 41.6463 1.0000 1.0000 0.0000 1.0000 -------------------------------------------------------- Cnt: 19100.6463 41.6463 183.7073 199.6463 425.0000 -------------------------------------------------------- Weightd mns(%): 0.2180 0.9618 1.0452 2.2251 -------------------------------------------------------- Presented characters: 19059 Transcribed characters: 18901 Alignment characters: 19100.64634146329 Alignment Error_rate: 2.225056% -------------------------------------------------------- Number of alignments by count... Occurrences: 642 20 5 1 1 1 0 0 0 3 Number_of_Alignments: 1 2 3 4 5 6 7 8 9 >10 Max= 495Click here to view the phrases file used in the example invocation above. This file was built (using a separate program) from the
sd1
data files from a text entry experiment. It
is for the "Datestamp Method #2" condition described in
Moble text entry using three keys
, by MacKenzie (NordiCHI 2002).
If the data are destined for importing into a spreadsheet, it's best
to use the ErrorMatrix
application without
the -co
option. The table portion of the output,
in this case, is comma-deliminted, full precision.
For example, use
PROMPT>java ErrorMatrix ds2-phrases.txt -etto build an error table (similar to the example above), or
PROMPT>java ErrorMatrix ds2-phrases.txt -em -ndto build an error matrix.
The matrix data are useful for creating a "confusion matrix" -- a matrix showing the counts (or probabilties) of presented characters vs. transcribed characters. For the above invocation, the data can be saved in a file and then inputted into Excel. It's a simple matter to generate a chart such as the following:
Click here to see the spreadsheet that contains the above chart.
A better looking chart can be obtained using gnuplot:
Field Summary | |
---|---|
char[] |
charDef
The default characters associated with the rows and columns in this ErrorMatrix . |
int |
columns
An int representing the number of columns in the error matrix |
int |
rows
An int representing the number of rows in the error matrix |
Constructor Summary | |
---|---|
ErrorMatrix()
Construct an ErrorMatrix using the default character set |
|
ErrorMatrix(char[] custom)
Construct an ErrorMatrix using a custom character set. |
Method Summary | |
---|---|
void |
enter(char c1,
char c2,
double count)
Enter the specified count for a c1-c2 character pair into this ErrorMatrix . |
void |
enter(StringPair[] sp)
Enter the counts for an array of presented/transcribed text phrases into this ErrorMatrix . |
void |
enter(java.lang.String s1,
java.lang.String s2)
Enter the counts for a presented/transcribed text phrase into this ErrorMatrix . |
double |
getCell(int row,
int col)
Return the contents of the specified cell |
double |
getColInsCount(int idx)
Return the count of the number of insertions of the character in column i |
double |
getColInsProb(int idx)
Return the probability of an Insertion of the character in column i . |
double |
getColSum(int col)
Return the sum of the entries in the specified column |
double[] |
getColSumArray()
Return an array containing the column sums |
double |
getDelCount()
Return the number Deletion errors. |
double |
getDelProb()
Return the the Deletion error probability. |
java.lang.String |
getHeader()
Return a comma-delimited string identifying the columns. |
int |
getIndex(char c)
Return the index of the specified character, or -1 if the character is not in the charDef array. |
double |
getInsCount()
Return the number of Insertion errors. |
double |
getInsProb()
Return the Insertion error probability. |
double[][] |
getMatrix()
Return the error matrix. |
double[] |
getRow(int idx)
Return an array containing the specified row |
double |
getRowDelCount(int idx)
Return the Deletion error count for the specified row |
double |
getRowDelProb(int idx)
Return the Deletion error probability for the specified row |
double |
getRowInsProb(int idx)
Return the Insertion error probability for the specified row. |
double |
getRowSubCount(int idx)
Return the Substitution error count for the specified row |
double |
getRowSubOtherCount(int idx)
Return the Substitute 'other' error count for the specified row |
double |
getRowSubProb(int idx)
Return the Substitution error probability for the specified row |
double |
getRowSum(int row)
Return the sum of the specified row |
double[] |
getRowSumArray()
Return an array containing the row sums |
double |
getRowTotCount(int idx)
Return the total error count for the specified row |
double |
getRowTotProb(int idx)
Return the error probability for the specified row |
double |
getSubCount()
Return the number Substitution errors. |
double |
getSubOtherCount()
Return the number of Substitution errors where c2 == "Other" |
double |
getSubProb()
Return the Substitution error probability. |
double |
getSum()
Returns the sum of all entries in the matrix. |
java.lang.String |
getSymbol(int idx)
Return a one-character String representing the symbol associated with the entries in a row or column. |
double |
getTotCount()
Return the total numbers of errors |
double |
getTotProb()
Return the total error probability |
static void |
main(java.lang.String[] args)
An application that uses the ErrorMatrix class. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public char[] charDef
ErrorMatrix
.The first entry is '_', representing the SPACE character. The last entry is '-', representing an Insertion error for rows, or a Deletion error for columns. The remaining entries are the characters appearing in the presented text phrases, namely, a-z.
public int rows
public int columns
Constructor Detail |
---|
public ErrorMatrix()
ErrorMatrix
using the default character set
public ErrorMatrix(char[] custom)
ErrorMatrix
using a custom character set.
The last character in the array should be '-', representing
an insertion error for rows and a deletion error for columns.
Method Detail |
---|
public void enter(char c1, char c2, double count)
ErrorMatrix
.
c1
- the presented characterc2
- the transcribed charactercount
- the amount to increment the corresponding cell bypublic int getIndex(char c)
charDef
array.
public void enter(java.lang.String s1, java.lang.String s2)
ErrorMatrix
.
This method does the work of converting the presenting/transcribed
strings into a set of alignments, and then scanning the alignments
character-by-character in determining the appropriate increment
for each c1-c2 pair. Use this method in a loop until all
phrases are entered into the matrix, or put the phrases in
a StringPair
array and use a single call to
the one-arg version of enter
.
s1
- the presented text strings2
- the transcribed text stringpublic void enter(StringPair[] sp)
ErrorMatrix
.
sp
- an array of StringPair
objects containing
presented and transcribed text phrases.public double[][] getMatrix()
public double getSum()
public double getCell(int row, int col)
public java.lang.String getHeader()
public java.lang.String getSymbol(int idx)
public double[] getRow(int idx)
public double getRowSum(int row)
public double[] getRowSumArray()
public double getRowTotCount(int idx)
public double getRowTotProb(int idx)
public double getColSum(int col)
public double[] getColSumArray()
public double getInsCount()
public double getInsProb()
public double getRowInsProb(int idx)
1.0
if the index is
size - 1
(i.e., the row associated with Insertion
errors). Otherwise, the return value is 0.0
public double getColInsCount(int idx)
i
public double getColInsProb(int idx)
i
. The probability is the ratio of the
number of Insertion errors of the specified character to the
total number of Insertion errors.
public double getSubCount()
public double getSubProb()
public double getSubOtherCount()
public double getRowSubCount(int idx)
public double getRowSubProb(int idx)
public double getRowSubOtherCount(int idx)
public double getDelCount()
public double getDelProb()
public double getRowDelCount(int idx)
public double getRowDelProb(int idx)
public double getTotCount()
public double getTotProb()
public static void main(java.lang.String[] args) throws java.io.IOException
java.io.IOException
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |