Anyone who has a website on which comments can be posted knows about comment spam. Comment spam is one of those things that no “the glass is half full” type could possibly have imagined until it bit them in the rump. Heck, I’m not even an optimist, and I was surprised when it first bit me. The fact is, as long as search engine rankings matter, spammers are going to do their best to get their links onto as many websites as possible.
My first line of defense was a CAPTCHA (you know, the image with numbers/letters that you type into a form field). A CAPTCHA is a great well to tell the difference between an automated spammer and a human with good vision using a browser capable of displaying images. Unfortunately, there are a lot of people who are neither, and those who think “meh… but blind people probably hate ‘hearing’ the internet anyway… so that’s not really important” are probably going to burn in hell… in a hell where Mark Pilgrim watches with glee as you burn and periodically spills something flammable in your general direction. Granted, you can leave a backup system such as, oh, e-mail for those who cannot view your CAPTCHA image, but that’s just another thing for you to check, and manually input, and it is just another hurdle for legitimate commenters to overcome.
Word blacklists are a good idea… a lot of spams have words like “casino” and “viagra,” and while these lists can be ever expanded and improved, they will never get 100% of all spam. With all the effort you’ll spend updating the list, you might as well manually moderate all your comments.
So right to it, then. Here is my idea of an optimal commenting system that is highly resistant to spam, while still being fast for your users and accessible for those with physical or browser-related disabilities.
The first piece of this solution is an optional CAPTCHA. What this means is that if your commenter so chooses, they may enter in the short sequence and guarantee that their comment will not be put into the moderation queue but instead be instantly posted.
If the CAPTCHA is not filled out, a series of checks are made. If the commenter’s e-mail address matches that of a previously approved comment, the comment is approved instantly. If the commenter is unknown, and the comment is on an old entry (“old” is subjective here… but I think 2 weeks to 1 month is a good cutoff), it goes to moderation. If the commenter is unknown and the comment contains more than X number of hyperlinks, it goes to moderation. Finally, if the commenter is unknown and the comment contains any of the blacklisted words, it goes to moderation. Anything left after these checks is approved.
Let us consider a few scenarios. For our purposes, an “old” post is anything older than 2 weeks, and 3 hyperlinks is the most a comment can have without going to moderation.
Scenario 1. A comment is made on a 3 week old post. The commenter is unknown, and the CAPTCHA was left blank. This comment will go to moderation. If it is a legitimate comment, it will be eventually be manually approved, and the commenter will be whitelisted henceforth.
Scenario 2. A comment is made on a 4 week old post. The commenter is unknown, but the CAPTCHA was filled in correctly. CAPTCHA trumps, and the comment is instantly approved, as well as the commenter being whitelisted.
Scenario 3. A comment is made on a 1 week old post. The commenter is unknown, and the CAPTCHA is blank. This could very well be a blind person’s comment. Because it is a comment on a new entry, as long as it doesn’t contain more than 3 links or contain any blacklisted words, it’ll go live instantly.
Maybe it’s just better if I give you a flowchart. A really really ugly flowchart made in MS Paint. By a retarded dolphin (not me).
Comments? Fatal flaws? Praise? Ridicule? (please direct any MS Paint-themed ridicule to the dolphin)
I think this is a workable system. In fact, many people have most of this system up, sans the CAPTCHA the the recognition of previous commenters. Those two elements only improve commenter’s ability to prove their legitimacy, and the CAPTCHA remains optional.
Watcher says
Not a bad job for a retarded dolphin! 😉
I think this would do a good job of stopping the traditional comment spam… but what about the people who might be more interesting in wasting your space and time than they are in linking back to themselves?
Mark says
Are there really that many of those? Pretty much, the objective of spam is to make money. Comment spam doesn’t give spammers money directly, but the links they place in the comment can help raise the search engine rankings of their websites, which either sell drugs, sell porn, or sell a chance to lose your shirt playing online blackjack.
Sure, it may be annoying to get a comment from a moron who is completely off topic or mind-numbingly boring, but I don’t think there’s any way to really stop that, unless you’re willing to do something like do a worldwide Draino for Mountain Dew switch. You might be able to get away with just switching the labels. Remember: these aren’t the brightest people in the world.
uhoh4242 says
cool. where can I download this great, hopefully free, system for my wordpress blog?
🙂
Mark says
Right now, this is only a proposed system. I have elements of it in place. I am using a plugin/hack that automatically whitelists people who have had comments approved before. I could also, if the spam gets bad, make it put all first-time comments into moderation.
Propecia says
How can you prevent this? Propecia
Jenna says
This new Spam filter is great, it prevents all the spam!