CST334 Tutorial Sheet 10 Chapter 9 Regular Expressions Summary of Common Regular Expressions Character Name Meaning . […] [^…] dot character class negated character class any one character any character listed any character not listed ^ $ \< \> ? * + \{m,n\} | (…) caret dollar backslash less-than backslash greater-than question mark asterisk or star plus sign repetition bar, or parenthesis position at start of line position at end of line position at beginning of word position at end of word matches zero or one occurence matches zero or more occurrences matches one or more occurrences matches m to n occurrences or \{m\} matches either expression it separates limits scope of | or encloses subexpressions for backreferencing Matches text previously matched within first, second, etc set of parenthesis \1, \2, … backreference Examples Variable names in C [a-zA-Z_][a-zA-Z_0-9]* Dollar amount with optional cents \$[0-9]+(\.[0-9][0-9])? Time of day (1[012]|[1-9]):[0-5][0-9] (am|pm) Hands On Copy directory regexp to your home directory and go into it How many lines in entire file speech? What words or patterns do these match? a) grep ‘w’ speech c) grep ‘we’ speech e) grep ‘we*’ speech g) grep '\([a-z]\)\1' speech i) egrep '\<([a-zA-Z]+) +\1\>' speech test a regular expression using echo: cat speech | wc -l b) grep ‘^w’ speech d) grep ‘\<we\>’ speech f) grep ‘w..[lk]’ speech h) egrep '\<the +the\>' speech echo aa | grep '\([a-z]\)\1' Also try out examples in Lecture28.ppt and Lecture31.ppt Using sed to substitute an expression with another sed 's/searchPattern/replacePattern/g' fileToProcess > fileToStoreResults You don’t even need to use vi or emacs: sed 's/war/struggle/g' 2003-02-28.txt > 2003-02-28mod.txt This uses sed to modify a series of hyperlink pathnames to make them refer to new locations (such as if directory '~mpc01c..' and subdirectories were moved to ~tom/public_html/csis1S06/mpc01c..' sed 's/mpc01c../tom\/csis1S06\/&\/public_html/' WebPagesS06.htm > ArchWebS06.htm Search regexp Replace pattern & = Matched Pattern (like mpc01c13) Another cool trick (not Reg Exp) How to rename all *.html files to *.htm ? WRONG: RIGHT: mv *.html *.htm rename .html .htm *.html