Today I put together the first draft of the regular expressions that will filter out bad words in the comments in a future commenting system. The variations are pulled from more than three years of comments + bad words on The Denver Post’s article commenting system.
These are the highlights:
- (CHAI|TE?E?A?)[ -]?B.*(A|U).*G.*G?(E.*R|I.*N.*G?|E.*D)?.*S?
- d(i|1|\|)ck(less|head|wad|weed)?
- (m(o|u)th(a|er))?(F|PH)[aeiouv\-\.\*':@]+.*C?.*K(E?R?S?|I?N?G?|FACE|HEAD)
- (jack|bad|dumb|fat)?(a|@).*[\$sz8x].*[\$sz8x].*e?.*(\$|s|z)?
- (dip|dog|chicken)?[\$s]?.*h.*[i\|1!-@a]+.*t(ty|t|head|eating)?s?
Popularity: 2% [?]
Would you be willing to share or sell this list? We are a game development company and are in need of this sort of list.