Avoiding Interstitials
For many reasons I've found the need to scrape article pages of The New York Times - either to prototype something or gather sample data (though this should become obsolete once the API's become public).
Automated scraping of anything is easy but with the nytimes.com any automated is going to hit an interstitial advert at some point.
The easy way to avoid this is to append "no_interstitial" to the URL arguments.
http://www.nytimes.com/2008/07/30/business/30bags.html?no_interstitial
That was easy. One less thing to worry about.