Today I put together the first draft of the regular expressions that will filter out bad words in the comments in a future commenting system. The variations are pulled from more than three years of comments + bad words on The Denver Post’s article commenting system.
These are the highlights:
- (CHAI|TE?E?A?)[ -]?B.*(A|U).*G.*G?(E.*R|I.*N.*G?|E.*D)?.*S?
- d(i|1|\|)ck(less|head|wad|weed)?
- (m(o|u)th(a|er))?(F|PH)[aeiouv\-\.\*':@]+.*C?.*K(E?R?S?|I?N?G?|FACE|HEAD)
- (jack|bad|dumb|fat)?(a|@).*[\$sz8x].*[\$sz8x].*e?.*(\$|s|z)?
- (dip|dog|chicken)?[\$s]?.*h.*[i\|1!-@a]+.*t(ty|t|head|eating)?s?
More From Joe Murphy's Local Journalism Blog
- The April Fool’s joke we thought it better not to run on The Denver Post’s site
- The Oregonian’s putting their photos up on flickr
- I know you know someone who still hunt and pecks at the keyboard.
Joe Murphy's Local Journalism Blog Recommends
- I wish I knew the source of this awesome photo of a… (Joe, Write!)
- More tips on how to get the best seat on an airplane (Flight Blog)
- I’m going to call this “Pre-DNC, Day 1″ (Joe, Write!)
- Kirk Montgomery reups at 9 (Ostrow Off the Record)
- Mad Men Barbies play house (Ostrow Off the Record)
- Tuesday, Super Duper Plus (Ostrow Off the Record)
Popularity: 2% [?]
Would you be willing to share or sell this list? We are a game development company and are in need of this sort of list.