Class MSD

java.lang.Object
  extended by MSD

public class MSD
extends java.lang.Object

MSD -- Minimum String Distance -- a class to generate various statistics related to the lexical distance between two strings. Includes a main method as a demonstration.

Example dialogue:

     PROMPT>java MSD
     usage: java MSD [-m] [-a] [-er]
     
     where -m  = output the MSD matrix
           -a  = output the set of optimal alignments
           -er = output the error rate
     
     PROMPT>java MSD -m -a -er
     ============================
     Minimum String Distance Demo
     ============================
     Enter pairs of strings (^z to exit)
     golfers
     gofpiers
     MSD = 3
     Error rate (old) = 37.5000%
     Error rate (new) = 36.3636%
           g  o  f  p  i  e  r  s
        0  1  2  3  4  5  6  7  8
     g  1  0  1  2  3  4  5  6  7
     o  2  1  0  1  2  3  4  5  6
     l  3  2  1  1  2  3  4  5  6
     f  4  3  2  1  2  3  4  5  6
     e  5  4  3  2  2  3  3  4  5
     r  6  5  4  3  3  3  4  3  4
     s  7  6  5  4  4  4  4  4  3
     Alignments: 4, mean size: 8.25
     golf--ers
     go-fpiers
     
     golf-ers
     gofpiers
     
     gol-fers
     gofpiers
     
     go-lfers
     gofpiers
     -------------
  

Author:
Scott MacKenzie, 2001, 2002, William Soukoreff, 2002

Constructor Summary
MSD(java.lang.String s1Arg, java.lang.String s2Arg)
          Create an MSD object.
 
Method Summary
 StringPair[] getAlignments()
          Returns pairs of alignment strings for this MSD object's s1/s2 string pair.
 double getErrorRate()
          Return a double equal to the text entry error rate (%).
 double getErrorRateNew()
          Returns the new-and-improved measure for the MSD error rate.
 int[][] getMatrix()
          Returns the minimum string distance matrix.
 int getMSD()
          Return an integer equal to the minimum distance between two strings.
 java.lang.String getS1()
          Returns the S1 string
 java.lang.String getS2()
          Returns the S2 string
static void main(java.lang.String[] args)
          Self-test main method to demonstrate the MSD class.
 double meanAlignmentSize()
          Returns the mean size of the alignment string as a double
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MSD

public MSD(java.lang.String s1Arg,
           java.lang.String s2Arg)
Create an MSD object.

Parameters:
s1Arg - the 1st text string (the "presented" text)
s2Arg - the 2nd text string (the "transcribed" text)
Method Detail

getMatrix

public int[][] getMatrix()
Returns the minimum string distance matrix. The number of rows in the matrix is s1.length() + 1. The number of columns is s2.length() + 1. The value of the minimum string distance statistic may be retrieved from msdMatrix[s1.length()][s2.length()].

Returns:
a two dimensional integer array containing the minimum string distance matrix.

getMSD

public int getMSD()
Return an integer equal to the minimum distance between two strings. The minimum distance is the minimum number of primitive operations that can be applied to one string to yield the other. The primitives are insert, delete, and substitute.

For details, see Soukoreff & MacKenzie (2001).

Returns:
an int equal to the minimum string distance.

getS1

public java.lang.String getS1()
Returns the S1 string


getS2

public java.lang.String getS2()
Returns the S2 string


getErrorRate

public double getErrorRate()
Return a double equal to the text entry error rate (%). The error rate is computed by dividing the MSD statistic by the larger of the lengths of the presented text string and the transcribed text string, and multiplying by 100.


main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Self-test main method to demonstrate the MSD class.

Throws:
java.io.IOException

getAlignments

public StringPair[] getAlignments()
Returns pairs of alignment strings for this MSD object's s1/s2 string pair. The alignment strings provide a convenient human-readable way to explain what transformations (insert, delete, substitute) are employed by the MSD algorithm. It's sort of an explanation of the 'D' matrix.

Returns:
an array of StringPairs containing pairs of alignment strings

meanAlignmentSize

public double meanAlignmentSize()
Returns the mean size of the alignment string as a double


getErrorRateNew

public double getErrorRateNew()
Returns the new-and-improved measure for the MSD error rate. The originally proposed MSD error rate was computed by dividing the MSD statistic by the larger of the sizes of the presented and transcribed text strings. As it turns out, this value differs slightly from the error rate calculated using our alignment-based error rate measure. This new-and-improved error rate measure fixes this problem. It is computed by dividing the MSD statistic by the mean size of the alignment strings.