Tuesday, October 25, 2005

Quick lesson on how to strip text out using RexEx

There's just something about RegEx that makes my ears bleed. Fortunately other people get it. Here's an example of how to filter text out of an expression, courtesy of Jeff Atwood's Coding Horror....


For example, if the word fox was what I wanted to exclude, and the searched text was:


The quick brown fox jumped over the lazy dog.


... and I used a regular expression of [^"fox"] (which I know is incorrect) (why this doesn't work I don't understand; it would make life SO much easier), then the returned search results would be:


The quick brown jumped over the lazy dog.



Regular expressions are great at matching. It's easy to formulate a regex using what you want to match. Stating a regex in terms of what you don't want to match is a bit harder.


One easy way to exclude text from a match is negative lookbehind:



\w+\b(?<!\bfox)


But not all regex flavors support negative lookbehind. And those that do typically have severe restrictions on the lookbehind, eg, it must be a simple fixed-length expression. To avoid incompatibility, we can restate our solution using negative lookahead:



(?!fox\b)\b\w+


You can test this regex in the cool online JavaScript Regex evaluator. Unfortunately, JavaScript doesn't support negative lookbehind, so if you want to test that one, I recommend RegexBuddy. It's not free, but it's the best regex tool out there by far-- and it keeps getting better with every incremental release.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.