Scraping AOL WebMail for contacts

This is the 3rd post in a short series discussing how I built an API to grab contact list information from Yahoo!, AOL, GMail and Hotmail. In our first post we reviewed the high level approach to scraping sites. In our second post we went over how to scrape Yahoo! - which is by far the easiest of the 4 sites to scrape. This post will discuss how to scrape AOL which is much more challenging as it requires some cookie manipulation and some javascript emulation. The tips below aren’t necessarily the best way to do this but it worked for me.

For working with AOL you need to work with the HttpClient and PostMethod objects, from the Apache Commons HttpClient API, directly. For all URLs you post to make sure to set User-Agent and set the cookie policy:

post.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
post.setRequestHeader(“User-Agent”,” Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; .NET CLR 2.0.50727)”);


Also for each post I set the Referrer attribute to the previous URL. After you post to the first URL you’ll need to process all the hidden variables that are returned and add them to next post. Also there was a cookie that I seemed to need to manually add, to do so I used the following snippet of code:

Cookie[] c = client.getState().getCookies();
String cStr = “”;
for (int i = 0 ; i < c.length; i++)
cStr += c[i].getName()+“=”+c[i].getValue()+“; “;
cStr+=“s_cc=true; s_sq=aolsnssignin%2Caolsvc%3D%2526pid%253Dsso%252520%25253A%252520login%2526pidt%253D1%2526oid%253DSign%252520In%2526oidt%253D3%2526ot%253DSUBMIT%2526oi%253D97″;
post.setRequestHeader(“Cookie”,cStr); This second post should also contain the user name and password. This is the first part of the login. In the response you’ll find that there is javascript that will forward to a new specific URL, you need to get it dynamically. I used the following code:

int onLoad = data.indexOf(“int http = data.indexOf(“http:”,onLoad);
int endPos = data.indexOf(‘\”,http);
String newURL = data.substring(http,endPos);

The resulting page ALSO has some JavaScript that you will be required to emulated. I used the following code to find the new URL:

http = data.indexOf(“gInitBasePath “);
int startPos = data.indexOf(‘\”‘,http);
endPos = data.indexOf(‘\”‘,startPos+1);
newURL = “http://webmail.aol.com”+data.substring(startPos+1,endPos);
newURL = newURL.replaceAll(” “, “%20″);


Your almost there!! In the response for that last request you need to find the uid returned in one of the cookies. Just grab all the cookies and parse out the “uid:”. Last but not least just post to the Address book url (you can do find this by using Fiddler) and pass in the value for the uid for user attribute. At that point you can use JScrape to process the resulting page and parse out all the email addresses.

Hopefully these tips help you in creating your own contact importer.