String Matching: Knuth-Morris-Pratt algorithm Heather Takeguchi What is String Matching? • Used in word find in document, as well as in the spell checker and in internet keyword searches • Looking for an exact string match • Reality of algorithms are more complicated; search string ‘string’ results in ‘String’ as well as ‘stringbean’ How do you match strings? • Finite-State-Automota • Brute-Force • Knuth-Morris-Pratt (KMP) • visualization tool for Brute Force and KMP www.dcc.ufmg.br/~cassia/smaa/english/ Virus Detection • Detection of virus is simply searching for a pattern string in a larger text. 1 ) viral signature (contagious seg.) matching 2 ) code enumeration (cmp. to old known file) 3 ) checksum methods (see size of file) Variation-tolerant matching • Fast substring matching • approximate string matching – voice recognition – dna sequencing Example: x = GATAA and y = CAGATAAGAGAA and k = 1 Example: x = GATAA and y = CAGATAAGAGAA and k = 1 Summary • Exact string matching good for grep & sed • String matching used in word find and in internet key word searches • KMP alg. is slightly better than Brute Force • approximate string matching and fast substring matching can be used for a wider use to practical applications. Acknowledgements • Virus detection: www.cse.uta.edu/~holder/courses/cse5311/lectures/applets/ je/a24.html • Speech recognition: www.kom.e-technik.tudarmstadt.de/pr/workshop/chair/ACMMM98/electronic_pr oceedings/robertson/ • Approximate string matching: http://www-igm.univmlv.fr/~lecroq/seqcomp/node3.html • Cormen, chaper 34