Sticky Posts
Jun 3, 2009
Estimate Domain Age Using PHP & Archive.org
Got a comment today on how to get Domain Age using Archive.org and, as the comment was too flattering to resist ... here it is. It uses the XPath query and it could have been done with RegExp only too. But ... it's a case study.
I think you are aware that this estimates the age. It's not exact as not all sites get scraped / accept scraping from the archive.org robot. More accurate results can be achieved using Whois Domain Age.
Registration is FREE, quick, painless and worth its weight in gold.
Enjoy!



That is awesome 5ubliminal.
I did not expect you to come back so quick! I hope you are not on speed. J/K
Any how the only reason I am looking at calculating domain age from waybackmachine is that my understanding is most whois sites do not allow scrapping etc (I know its bullshit), unless you know of a better way. Until then I have no choice but to approximate domain age. But it makes sense because in order to calculate time left in expiration of a domain, you would need whois data.
If you live in YeeHaw Tx. lets go for a beer someday.
take cares
You neet to query whois servers directly, not sites.
whois.internic.net:43 … connect (fsockopen) and write a line with a .com domain … e.g.: 5ubliminal.com[enter]
See what you get in return.
You can even test with telnet:
Start - Run - telnet whois.internic.net 43
PS: I had this function for a long time :) And I’m from Romania, Eastern European.
Yeah I found PHP WHOISclass that apparently does the down and dirty. Thanks again. Romania heh ! Thats awesome. Sorry didn’t mean to sound like an ethnocentrist.
you wouldn’t happen to have a function that gets the approx back links of a site would ya ?
Sent u an email.
thanks for this, I am goinf to use it in combination with whois as I have found that different registrars use slightly different info formats and its a pain to work out what all of them are!
That is the problem. They have no general formatting guidelines and you may endup parsing each whois server differently.
PS: I think they do this willingly not to be scraped easily :)