Reverse Engineering JavaScript Encryption Functions To Scrape Email Addresses


Disclaimer: I do not condone spamming (sending unsolicited emails). The information here is provided purely for educational purposes and to highlight problems with implemented solutions to try to combat the scraping of email addresses.

Since obfuscating data in this way goes against both the W3C‘s and Sir Tim Berners-Lee‘s goals of creating a Semantic Web, I have no issues discussing how to un-obfuscate it, so we’ll have no discussions about ethics in the comments, thank you. [Updated] Ok, we’ll have some discussion of ethics if you want.


Oftentimes when developing a website, a webmaster is smart enough to take into account the fact that web bots and scrapers are going to come across their site looking for information. One piece of information that they really don’t want to be scraped is email addresses, because many web scrapers are looking to harvest email addresses to spam.

Despite this, it is often desirable to display email addresses for human visitors, so a workaround is required. There are a number of methods in common usage, but today we’ll be looking at encrypting email addresses using JavaScript encryption functions for frontend display and how it is not a viable solution in combatting web scrapers.

For today’s example we’ll be looking at this wedding planning website which goes to some lengths to try and protect its users email addresses from bots, whilst still displaying them for human visitors.

Reverse Engineering The Encryption Function

In our example the webmaster has gone to great lengths to hide its email addresses. However, as always, we can find a way around this.

To start off with, looking at the code we can see that there is a table with the class name of company-email. It would be logical then for our web bot to assume that any email data is going to be held and displayed within this table. And, if we look, we see the following:

<table class='company-email'><tr><td width='80px' valign='top'><script>shemword();</script></td><td><SCRIPT>sw('moc_yyz_rehpargotohpbytteb.ytteb.com');</SCRIPT></td></tr></table>

…two table cells with <script> </script> tags in them making calls to the functions shemword(); and sw();, with the second function passing a single parameter of moc_yyz_rehpargotohpbytteb.ytteb.com.

Now, although it may seem obvious that the second function is the one displaying the users email address, we need to be thinking like a web bot here and try to reverse engineer both of these fields – both could contain useful data that we may want to scrape and store.

First we’ll start with the shemword(); function, which when we go through the linked script files we find to be:

function shemword()
{
    document.write(String.fromCharCode(69,109,97,105,108));
}

This is writing something out into the document for display.

The function String.fromCharCode(); is one of the first we should have in our reverse engineering class. What the function does is take, as an argument, a series of character codes which represent ascii codes and convert them to standard English UTF characters.

There is no direct equivalent function in PHP which takes a string of ascii codes and converts them, although there is chr() which will take a single ascii code and return a single character. Using this function, we can take a string, convert it to an array and iterate over the different codes in our method.

Our reverse encryption method for this use case should look something like:


class ReverseEncrypt {

    public static function stringFromCharCode( $arr_char_codes ) {
        return implode( array_map( 'chr', $arr_char_codes ) );
    }

}


// Call this method on a comma separated string
$str_shemword_chars = '69,109,97,105,108';

$str_shemword_decrypted = ReverseEncrypt::stringFromCharCode( explode( ',' $str_shemword_chars ) );

echo $str_shemword_decrypted;   // Outputs "Email"

When we run the previous information through it we get the result of “Email“.

Having that returned as the data from the first cell of our table is great, because logic (and we should have this logic coded into our scraper somewhere) would have us assume that the other cell contains an actual email address.

Moving on to the second cell we hit the sw(‘moc_yyz_rehpargotohpbytteb.ytteb.com’); function, which when we find in the code we see to be:

function sw(t)
{
  t=t.substring(0,t.length-4)

  r=String.fromCharCode(95,121,121,122,95);

  while (r.indexOf("#") > -1)
    r=r.replace("#","");

  while (t.indexOf(".") > -1)
    t=t.replace(".","@");

  while (t.indexOf(r) > -1)
    t=t.replace(r,".");

  var s=""
  var l=t.length;

  for (i=0;i<=l;i=i+1)
  {
      s=s + t.charAt(l-i);
  }

  document.write('<a href="http://www.theweddingplannerireland.ie/js/%5C%27mailto:%27" +="" s="" '?subject="Enquiry" from="" theweddingplanner\'="">' + s + '</a>');
}

This also appears to be writing something out to the page and a quick grok of that write will return the obvious mailto: in there. Bingo, we have a winner!

We should now be able to add another method to our ReverseEncrypt class. Going through line-by-line:

Line 3

In this line the encrypted string passed to the function’s length is being evaluated, then taking 4 from that and returning a substring of the original string with the length – 4 removed from the end.

t=t.substring(0,t.length-4)
$t = substr( $t, 0, strlen( $t ) - 4 );

Here we are converting charcodes again.

r=String.fromCharCode(95,121,121,122,95);
$r = implode( array_map( 'chr', array('95', '121', '121', '122', '95') ) );

You could also use our previously defined method to acccomplish this, as such:

$r = self::stringFromCharCode( array('95', '121', '121', '122', '95') );

Here we have a while loop with the condition that as long as the # character is present in the string, it is replaced by an empty string.

while (r.indexOf("#") > -1)
    r=r.replace("#","");
while ( strpos( $r, '#' ) !== false ) {
    $r = str_replace( '#', '', $r );
}

Here, like previously, we have a while loop, this time evaluating whether the . is present and if it is replacing it with an @ symbol.

while (t.indexOf(".") > -1)
    t=t.replace(".","@");
while ( strpos( $t, '.' ) !== false ) {
    $t = str_replace( '.', '@', $t );
}

Here we have, yet another, while loop. This time evaluating whether the previously assigned r variable is present and replacing it with a ..

while (t.indexOf(r) > -1)
    t=t.replace(r,".");
while ( strpos ( $t, $r ) !== false ) {
    $t = str_replace( $r, '.', $t );
}

This line is kind of unnecessary for anything other than freeing up memory perhaps or getting ready for another email address? But it’s just setting the s variable to an empty string.

var s=""
$s = '';

This section of code is actually pretty convoluted, in that all it is doing is reversing a string. But since JavaScript doesn’t have a ‘string reverse’ function, this is actually one of the easiest ways to accomplish it. I’ll go through it, even though in PHP we just have to run a simple strrev() on the string in question.

First the length of the string is being assigned. Then, for the length of the string, take the letter at the length minus the number of iterations and append it to the new string s:

var l=t.length;

for (i=0;i<=l;i=i+1)
  {
      s=s + t.charAt(l-i);
  }
$s = strrev( $s );

Lastly, the JavaScript is writing out a string to send mail to the address. Since we likely don’t want to do this, at least at this point in time, we’ll just return the email address as a string. Then we can put it in a database or something.

document.write('<a href="http://www.theweddingplannerireland.ie/js/%5C%27mailto:%27" +="" s="" '?subject="Enquiry" from="" theweddingplanner\'="">' + s + '</a>');
return ( $t );

This gives us a final class that should look something like this:


class ReverseEncrypt {

    public static function stringFromCharCode( $arr_char_codes ) {
        return implode( array_map( 'chr', $arr_char_codes ) );
    }

    public static function decryptSwtEmail( $t // $str_encrypted_email ) {
        $t = substr( $t, 0, strlen( $t ) - 4 );

        $r = implode( array_map( 'chr', array('95', '121', '121', '122', '95') ) );

        while ( strpos( $r, '#' ) !== false ) {
            $r = str_replace( '#', '', $r );
        }

        while ( strpos( $t, '.' ) !== false ) {
            $t = str_replace( '.', '@', $t );
        }

        while ( strpos ( $t, $r ) !== false ) {
            $t = str_replace( $r, '.', $t );
        }

        $t = strrev( $t );

        return ( $t );
    }

}


// Call this method on an encrypted string

$str_swt_email_encrypted = 'moc_yyz_rehpargotohpbytteb.ytteb.com';

$str_swt_email_decrypted = ReverseEncrypt::decryptSwtEmail( $str_swt_email_encrypted );

echo $str_swt_email_decrypted; // Outputs betty@bettybphotographer.com

When we run a sample encrypted email address through the method, such as moc_yyz_rehpargotohpbytteb.ytteb.com, we get an output of betty@bettybphotographer.com. And if we were to run the scraper through the whole site (yes, I’ve done it) we now have over 4,000 decrypted email addresses that somebody didn’t want us to have.


Taking the time to find more examples like this and extending our reverse engineering encryption class even further can prove to be a real timesaving endeavour and one I think it’s worth looking in to if you plan on automated scraping of various ‘unknown’ sources on a large scale. Next time I’ll be looking at decrypting the popular Enkoder Form by Hivelogic.


In conclusion, webmasters looking to avoid web scrapers from scraping email addresses need to come up with better solutions. As shown here, it’s easy for people like myself, or worse – spammers!, to get this information if you don’t try harder to hide it.

As a side note, and referring back to the opening paragraph of this post, not displaying this information on your web page in plain text and semantically marking it as being an email address (<address>admin@jacobward.co.uk</address>) is a detriment to the internet as a whole. I say “either display it properly or don’t display it at all”. What do you think?