New here? Read Greetings Earthling!

Estimate Domain Age Using PHP & Archive.org

Got a comment today on how to get Domain Age using Archive.org and, as the comment was too flattering to resist ... here it is. It uses the XPath query and it could have been done with RegExp only too. But ... it's a case study.

I think you are aware that this estimates the age. It's not exact as not all sites get scraped / accept scraping from the archive.org robot. More accurate results can be achieved using Whois Domain Age.
Read the rest of this entry »

RSScraping | Scraping RSS With PHP, DOM and XPath Magic

I wrote a post on some XPath magic for all you evil scrapers out there. Now I will show you how to scrape RSS feeds. I used to do it the RegExp way but now I decided to head over to XML parsing and DOM processing. Lazy enough I decided to look for an already made version and found a quite good one actually. Close to my needs but not exactly. I took it, used and abused the source (ended up changing almost completely), and achieved the one I needed. The good thing about the RSS Scraper using DOM XML + PHP is that it's way shorter and much more reliable than the RegExp version.
Read the rest of this entry »