Myths about e-mail spam harvesting

user [at] domain [dot] com . Looks familiar? We are all quite used to those nifty ways of rewriting an email address to something that will look like Greek to the Evil Spam bots. There are many other alternatives, like JavaScript or not using the mailto: tag (as it would make any difference…). However, I would say there are a lot of misconceptions and faulty assumptions concerning this. Maybe even myths. My aim here is to, from a slightly technological perspective, show that most of these methods won’t be very effective against a determined e-mail harvester, and also suggest some methods that would have a greater chance of actually protecting your inbox. Worth stating here is – I’m not a spammer and I’m not familiar with their practices. I’m merely guessing how spammers would do it.

So, what is a spam bot? It’s in most ways similar to what’s usually called a spider or robot – a program from a search engine company that crawls the web for content. I’m not familiar with any special spam bot, but there are some basic assumptions to be made. A bot will read the raw data of a web page – that is, the HTML code, not the site content itself. And as the purpose is to find e-mail addresses, it will just search through the file for appropriate address patterns. It will go from site to site using the links or using an existing index of sites (you can even use Google to find e-mail addresses).

In the pre-spam days, people just wrote their < a href="mailto:[email protected]">[email protected]< /a> and everything was fine. But then the spambots arrived. Anyone familiar with regular expressions will know that it is notoriously easy to match an e-mail address. And, as the spam bot is reading the HTML, there is no point in changing the link text – the mailto:address is the important part. Basically, if the email address is stated in clear text anywhere in the HTML, the spam bot will most certainly find it.

So what other options do we have? Rewriting the address in a human-readable form that won’t be recognized by spambots. The common one is user[at]domain[dot]com or variants thereof, such as user(at)domain(dot).com or just user at domain.com. The problem here is that these are just mere variations of the computer-readable e-mail address syntax. It’s almost too easy to just change your regular expression to accommodate for these new ways of writing an email-address. The problem is that people are writing it in almost the same way everywhere – and that’s the key to making it computer readable – predictability. You can make an expression that matches user[any-symbol-goes-here]domain[any-symbol-goes-here]com. It can recognize an address string on the top level domain (.com, .net, etc) to make sure it is an email address – there is a limited amount of top-level domains, and people can hardly omit them in risk of not getting the mails to the right destination.

You can always make more elaborate ways of writing your e-mail address of course, and probably, you can fool the spam bots. But writing your e-mail such as “take my user name and put it before my domain name, the one in the address bar in your browser” is hardly convenient, and you can easily confuse people by writing “user [ youknowwhattowritehere, right?] domain . [look in the address bar!]“. So what to do? There is always JavaScript.

The popular method of using JavaScript to obfuscate your e-mail address is mostly based on writing the characters in another format. For example, you can write them in integer ASCII code, in UTF code or hexadecimal. In the HTML-file, it will look like gibberish, but when the string is put through a JavaScript function at load-time it will all render as nice text on the screen. The problem here is the same as before. There are only a few string conversions built into JavaScript and it’s quite trivial for a spam bot to recognize ASCII-codes and convert them to readable format. Still, given some thought, you can make it very hard for the spam bots. For example, by writing the ASCII codes with a pre-added value will make any normal conversion unreadable – but your own JavaScript function will subtract the given number before converting. Of course, you will probably have to write the number somewhere in the code, but it’s certainly a non trivial task to make a spam bot that can understand JavaScript code in that way. Or is it? Actually, it’s not hard to imagine a spam bot with JavaScript capabilities. If the bot can run JavaScript, it doesn’t matter at all how fancy your obfuscating function is – if it’s readable when it’s rendered, the bot will see it. The bot might have full capabilities of rendering a web page like a modern browser making it even harder to find alternatives.

So, what options do we have left? There is a smart and simple solution – images. Write your e-mail as an image (either a pre-made image file or through the graphics library of PHP, for example). Of course, you can’t link it to your e-mail address – but you can link it to a pop up contact form that makes it easy to send a mail to you. However, beware – there is technology to recognize text in images, and you can bet some of the spammers can get this technology. That’s why so many sites nowadays give you a confusing image with hardly readable text on a horrible background when you try and register. You could also write your e-mail like this, but then we have passed the point of user-friendliness, haven’t we? People don’t want to decode something to write you an e-mail.

There is no perfect solution? Nope. If lot’s of people start using a method, you can bet the spammers will find a way to interpret it, if possible. But there are very safe alternatives, making the probability quite low that the spammers will get your address. The basic rule is – either use something really smart, or just use something unique. Few people write their email addresses backwards – if you do, the spammers will probably not catch it. By just writing the address like user[blurb}domain.com will make it hard – notice that the brackets aren’t matching. Another great way is using a form for sending mails on the site – and do you want to make it really secure, add a validation image that isn’t computer readable (such as KittenAuth). When adding the form, make sure the e-mail address is not written somewhere in the HTML form – write the address in the PHP code. Another method is to use Flash – if done well, it will look good, be user friendly and very hard to interpret for spammers. In the end, however, I am sure the spammers will find new tools to dig emails, and we have to find new ways of protecting ourselves.

Let’s play

Let's Play!

Need I say more? Now, how am I going to find time for that paper I have to write within the week…

UPDATE Absolute first impressions are:

  • It’s heavy, but good looking. Fingerprint prone.
  • Gorgeus interface and buttons.
  • Somewhat hard to navigate inside games – not self-evident how to exit a game, for example. Note that I have no previous Playstation or PSP experience.
  • Really good graphics, but the current TV is really crappy.
  • Using the controller for other things than racing is impossible :) It was hilarious trying to fight my friend Max’ robot in Gundam, nobody could control them and were just fumbling around.
  • The packaged games were not really heart-stopping, I hope I can get my hands on Resistance: Fall of Man soon! Although I don’t look forward to a first-person shooter with hand control :p .

Introducing Blogfront!

I have many unfinished hacks lying around on my web server(s), but this time it is time to release something finished. Blogfront is a simple script that fetches, crops, caches and displays random images from your Flickr account, as demonstrated by this blogs header image (try refreshing the page).

Download and installation

  • Version 0.7 (latest): blogfront.php (bugfixes, better default parameters, resizing)
  • Version 0.6: blogfront.php (first, corrected, release)

To install, copy the php file to your web server, preferably to its own directory. Make sure the script has write permissions (do this my doing chmod 777 or similar). It needs write permissions to be able to save cached photos, which is necessary for decent fetch times. Edit the script file to add your Flickr account ID and your API key (currently you need your own API key… it’s beta you know 😉 ).

Usage

The script will fetch a list of photos with a specific tag from your Flickr account. Among these photos, one will randomly be picked, downloaded, cropped and resized to fit desired size, cached locally and then displayed. All you have to do is to make an <img> tag with the src parameter pointing to the blogfront.php script file residing on your server. The script file takes some parameters for customization:

  • width – the width in pixels of the cropped image to be shown. Default is 640.
  • height – the height in pixels of the cropped image to be shown. Default is 200.
  • tag – the tag(s) to use when finding candidates among your Flickr photos. Default is “blogfront”, but you can change to anything. If you have several scripts running, you might want to have a different tag for each so you can have different sets of images. It is of course possible to define several tags, just write them comma separated.
  • border – Boolean saying if there should be a black border in the photo or not. Default is “true”.

Example: However, you can edit these parameters, and more, from within the script. Read it for more information.

What is even better is that you can control how to crop each photo. Go to your Flickr account and find the pictures you have tagged to show up. For each picture, do one of the following to create a custom crop (default crop is from top left corner):

  1. Make a note (press the Add note icon on Flickr), and position and scale it roughly to where you want the crop. The actual crop and resize will be fitted inside your note with the same center point. In the note write “blogfront” to identify it. Nothing else.
  2. Edit the “blogfront” (or equivalent) tag to add the locations of the crop. The format is “blogfrontx1y2w3h4”. Replace “1” and “2” with the x and y coordinate respectively of the start of the crop. Replace “3” and “4” with the width and height of the crop. A number can be left out if one so wish. Remember, a note will have precedence over a tag.

FAQ

  • It is ugly with a big note on my photo, what to do? In this version, there is no other way than to try to find out the location of the note you posted and transfer these numbers to a tag instead, as listed above. To find the coordinates of a note, view source for the photo page and search for the id photo_notes.
  • I get a black image! What is wrong? This probably means no photo could be found. Make sure your tag is correct.
  • Something is wrong with the image thats shows up! Try to delete the cache, i.e. delete all jpg-files from the script directory. Beware, this will increase load time considerably until all images have been cached again.
  • How can I debug the script? Unfortunately, at this time it is impossible, other than echoing manually from within the script.

Disclaimer

The script is released as Creative Commons Share-Alike. I bear NO responsibility for the results of using this script. It is in beta and expected to have bugs. (oh, when you find the bugs, please let me know!)

Apple, please save me

I have been yipping a whole lot about Mac before, more specifically about the MacBook Pro. Now, with barely three weeks left in Hong Kong, I have never been more close to buying one – which puts me into some trouble; will it be delivered in time and what should I do with my existing laptop?

A few weeks ago I had more or less decided not to buy one, at least not until the lifespan of my current laptop had passed. But my laptop (a Z60m) seem to have made up its mind for me. The last weeks it has been making trouble constantly, and this is on a system I re-installed in August. Programs crash more often than not, things are always slow (I kind of wonder what is being done with all those megahertz), processor load randomly go up to 100%, the battery time is useless even though I never really disconnect the power and I sit here wishing I had programs like iPhoto.

An important factor is that my room mate finally bought one a few weeks ago (after nagging me to buy one for more than a year!). It seems to run very smoothly, and while my laptop shuts down randomly when gaming, and delivering decent but not wonderful performance, he’s complaining he can’t set the graphics settings higher in the games. What makes these big differences between Windows and OS X baffles me, and I start to understand all those Mac evangelists.

I am ready to do the switch, but I am afraid it is more out of frustration with Windows and PCs in general rather than actual interest in OS X. Even though MacBooks are much cheaper here in HK than Europe, they are still expensive, more expensive than most PC laptops. It is not ecomonically sound to buy a new computer when I, according to all standards, already have a new computer (one that I thought would solve all my problems when I bought it a little more than a year ago). Put differently – I am wasting a lot of money into electronics that never seem to fully please me!

I guess I have to make my decision in the coming days. What should it be?