Scraping Hotmail for Contacts using JScrape

As we’ve seen in my posts for scraping AOL, GMail and Yahoo, each site has its own “tricks” that make it challenging to scrape contact information from. The final site in this series of posts is for Hotmail. Hotmail is one of the trickier ones. As I did with the previous posts I’m going to outline some of the trickier parts of scraping the site.

After posting to Hotmail.com you need to parse all the hidden parameters on the form, you will need to repost those parameters along with the login and passwd for the user. You also need to pass a parameter PwdPad which is generated by remove X chars from the end of the string “IfYouAreReadingThisYouHaveTooMuchFreeTime” where X is the length of the user’s password. To determine the URL you need to parse out of the JavaScript the value of the JS variable, g_DO[”hotmail.com”].

After posting to the URL you will need to parse some more JS, find the window.location.replace JS and use the URL in that parameter to post your next URL. In the response you will find a mailbox ID, you can find that by looking for ‘_UM=’ in the response and parsing out the value. From there you are home free… simply post to: http://”+host+“/cgi-bin/addresses?”+mbox (you can get the host by grabbing the attribute using the following code: String host = get.getRequestHeader(“Host”).getValue(); ).

Well that’s about it. Hopefully that helps some people out. If you want to see this in action sign up for an account at MyFriendSuggests.com and use my version of the contact importer (and while your there try our site out and let us know what you think).