String Matching: Knutt-Morris

advertisement
String Matching:
Knuth-Morris-Pratt algorithm
Heather Takeguchi
What is String Matching?
• Used in word find in document, as well as
in the spell checker and in internet keyword
searches
• Looking for an exact string match
• Reality of algorithms are more complicated;
search string ‘string’ results in ‘String’ as
well as ‘stringbean’
How do you match strings?
• Finite-State-Automota
• Brute-Force
• Knuth-Morris-Pratt (KMP)
• visualization tool for Brute Force and KMP
www.dcc.ufmg.br/~cassia/smaa/english/
Virus Detection
• Detection of virus is simply searching for a
pattern string in a larger text.
1 ) viral signature (contagious seg.) matching
2 ) code enumeration (cmp. to old known file)
3 ) checksum methods (see size of file)
Variation-tolerant matching
• Fast substring matching
• approximate string matching
– voice recognition
– dna sequencing
Example:
x = GATAA and y = CAGATAAGAGAA and k = 1
Example:
x = GATAA and y = CAGATAAGAGAA and k = 1
Summary
• Exact string matching good for grep & sed
• String matching used in word find and in
internet key word searches
• KMP alg. is slightly better than Brute Force
• approximate string matching and fast
substring matching can be used for a wider
use to practical applications.
Acknowledgements
• Virus detection:
www.cse.uta.edu/~holder/courses/cse5311/lectures/applets/
je/a24.html
• Speech recognition: www.kom.e-technik.tudarmstadt.de/pr/workshop/chair/ACMMM98/electronic_pr
oceedings/robertson/
• Approximate string matching: http://www-igm.univmlv.fr/~lecroq/seqcomp/node3.html
• Cormen, chaper 34
Download