Myths about e-mail spam harvesting

user [at] domain [dot] com . Looks familiar? We are all quite used to those nifty ways of rewriting an email address to something that will look like Greek to the Evil Spam bots. There are many other alternatives, like JavaScript or not using the mailto: tag (as it would make any difference…). However, I would say there are a lot of misconceptions and faulty assumptions concerning this. Maybe even myths. My aim here is to, from a slightly technological perspective, show that most of these methods won’t be very effective against a determined e-mail harvester, and also suggest some methods that would have a greater chance of actually protecting your inbox. Worth stating here is – I’m not a spammer and I’m not familiar with their practices. I’m merely guessing how spammers would do it.

So, what is a spam bot? It’s in most ways similar to what’s usually called a spider or robot – a program from a search engine company that crawls the web for content. I’m not familiar with any special spam bot, but there are some basic assumptions to be made. A bot will read the raw data of a web page – that is, the HTML code, not the site content itself. And as the purpose is to find e-mail addresses, it will just search through the file for appropriate address patterns. It will go from site to site using the links or using an existing index of sites (you can even use Google to find e-mail addresses).

In the pre-spam days, people just wrote their < a href="mailto:yada@yada.net">yada@yada.net< /a> and everything was fine. But then the spambots arrived. Anyone familiar with regular expressions will know that it is notoriously easy to match an e-mail address. And, as the spam bot is reading the HTML, there is no point in changing the link text – the mailto:address is the important part. Basically, if the email address is stated in clear text anywhere in the HTML, the spam bot will most certainly find it.

So what other options do we have? Rewriting the address in a human-readable form that won’t be recognized by spambots. The common one is user[at]domain[dot]com or variants thereof, such as user(at)domain(dot).com or just user at domain.com. The problem here is that these are just mere variations of the computer-readable e-mail address syntax. It’s almost too easy to just change your regular expression to accommodate for these new ways of writing an email-address. The problem is that people are writing it in almost the same way everywhere – and that’s the key to making it computer readable – predictability. You can make an expression that matches user[any-symbol-goes-here]domain[any-symbol-goes-here]com. It can recognize an address string on the top level domain (.com, .net, etc) to make sure it is an email address – there is a limited amount of top-level domains, and people can hardly omit them in risk of not getting the mails to the right destination.

You can always make more elaborate ways of writing your e-mail address of course, and probably, you can fool the spam bots. But writing your e-mail such as “take my user name and put it before my domain name, the one in the address bar in your browser” is hardly convenient, and you can easily confuse people by writing “user [ youknowwhattowritehere, right?] domain . [look in the address bar!]“. So what to do? There is always JavaScript.

The popular method of using JavaScript to obfuscate your e-mail address is mostly based on writing the characters in another format. For example, you can write them in integer ASCII code, in UTF code or hexadecimal. In the HTML-file, it will look like gibberish, but when the string is put through a JavaScript function at load-time it will all render as nice text on the screen. The problem here is the same as before. There are only a few string conversions built into JavaScript and it’s quite trivial for a spam bot to recognize ASCII-codes and convert them to readable format. Still, given some thought, you can make it very hard for the spam bots. For example, by writing the ASCII codes with a pre-added value will make any normal conversion unreadable – but your own JavaScript function will subtract the given number before converting. Of course, you will probably have to write the number somewhere in the code, but it’s certainly a non trivial task to make a spam bot that can understand JavaScript code in that way. Or is it? Actually, it’s not hard to imagine a spam bot with JavaScript capabilities. If the bot can run JavaScript, it doesn’t matter at all how fancy your obfuscating function is – if it’s readable when it’s rendered, the bot will see it. The bot might have full capabilities of rendering a web page like a modern browser making it even harder to find alternatives.

So, what options do we have left? There is a smart and simple solution – images. Write your e-mail as an image (either a pre-made image file or through the graphics library of PHP, for example). Of course, you can’t link it to your e-mail address – but you can link it to a pop up contact form that makes it easy to send a mail to you. However, beware – there is technology to recognize text in images, and you can bet some of the spammers can get this technology. That’s why so many sites nowadays give you a confusing image with hardly readable text on a horrible background when you try and register. You could also write your e-mail like this, but then we have passed the point of user-friendliness, haven’t we? People don’t want to decode something to write you an e-mail.

There is no perfect solution? Nope. If lot’s of people start using a method, you can bet the spammers will find a way to interpret it, if possible. But there are very safe alternatives, making the probability quite low that the spammers will get your address. The basic rule is – either use something really smart, or just use something unique. Few people write their email addresses backwards – if you do, the spammers will probably not catch it. By just writing the address like user[blurb}domain.com will make it hard – notice that the brackets aren’t matching. Another great way is using a form for sending mails on the site – and do you want to make it really secure, add a validation image that isn’t computer readable (such as KittenAuth). When adding the form, make sure the e-mail address is not written somewhere in the HTML form – write the address in the PHP code. Another method is to use Flash – if done well, it will look good, be user friendly and very hard to interpret for spammers. In the end, however, I am sure the spammers will find new tools to dig emails, and we have to find new ways of protecting ourselves.

1 thought on “Myths about e-mail spam harvesting”

Leave a Reply

Your email address will not be published. Required fields are marked *