So, what is a spam bot? It’s in most ways similar to what’s usually called a spider or robot – a program from a search engine company that crawls the web for content. I’m not familiar with any special spam bot, but there are some basic assumptions to be made. A bot will read the raw data of a web page – that is, the HTML code, not the site content itself. And as the purpose is to find e-mail addresses, it will just search through the file for appropriate address patterns. It will go from site to site using the links or using an existing index of sites (you can even use Google to find e-mail addresses).
In the pre-spam days, people just wrote their
< a href="mailto:firstname.lastname@example.org">email@example.com< /a> and everything was fine. But then the spambots arrived. Anyone familiar with regular expressions will know that it is notoriously easy to match an e-mail address. And, as the spam bot is reading the HTML, there is no point in changing the link text – the mailto:address is the important part. Basically, if the email address is stated in clear text anywhere in the HTML, the spam bot will most certainly find it.
So what other options do we have? Rewriting the address in a human-readable form that won’t be recognized by spambots. The common one is user[at]domain[dot]com or variants thereof, such as user(at)domain(dot).com or just user at domain.com. The problem here is that these are just mere variations of the computer-readable e-mail address syntax. It’s almost too easy to just change your regular expression to accommodate for these new ways of writing an email-address. The problem is that people are writing it in almost the same way everywhere – and that’s the key to making it computer readable – predictability. You can make an expression that matches user[any-symbol-goes-here]domain[any-symbol-goes-here]com. It can recognize an address string on the top level domain (.com, .net, etc) to make sure it is an email address – there is a limited amount of top-level domains, and people can hardly omit them in risk of not getting the mails to the right destination.
So, what options do we have left? There is a smart and simple solution – images. Write your e-mail as an image (either a pre-made image file or through the graphics library of PHP, for example). Of course, you can’t link it to your e-mail address – but you can link it to a pop up contact form that makes it easy to send a mail to you. However, beware – there is technology to recognize text in images, and you can bet some of the spammers can get this technology. That’s why so many sites nowadays give you a confusing image with hardly readable text on a horrible background when you try and register. You could also write your e-mail like this, but then we have passed the point of user-friendliness, haven’t we? People don’t want to decode something to write you an e-mail.
There is no perfect solution? Nope. If lot’s of people start using a method, you can bet the spammers will find a way to interpret it, if possible. But there are very safe alternatives, making the probability quite low that the spammers will get your address. The basic rule is – either use something really smart, or just use something unique. Few people write their email addresses backwards – if you do, the spammers will probably not catch it. By just writing the address like user[blurb}domain.com will make it hard – notice that the brackets aren’t matching. Another great way is using a form for sending mails on the site – and do you want to make it really secure, add a validation image that isn’t computer readable (such as KittenAuth). When adding the form, make sure the e-mail address is not written somewhere in the HTML form – write the address in the PHP code. Another method is to use Flash – if done well, it will look good, be user friendly and very hard to interpret for spammers. In the end, however, I am sure the spammers will find new tools to dig emails, and we have to find new ways of protecting ourselves.