Sticky Posts
Jan 24, 2009
elHttpClient Evolution - PHP + cURL + HTTP
My mildly wildly popular eHttpClient class no longer available on my old blog has evolved. It changed into elHttpClient, an all new toy for the fellow blackhats and php coders. eHttpClient is the one-stop solution for web downloading combining the power of cURL with PHP to provide a dead easy solution for your web downloading (scraping) needs.With basic knowledge you can do anything in terms of downloading web content.
This class has been by far my most popular share and brought in most feedback. I've looked into the feedback trying to make things even easier and this was born. Oh wtf ... the truth is I've written this for myself and decided to share it! I didn't care about no feedback.
What changed compared to the old one?
Everything. There's virtually no such thing are backwards compatibility. It's a totally different thing employing new techniques to make things easier yet giving more power to the coder. Things can be achieved with fewer lines of code. It's an all OOP-star and the initial class broke into three new classes ... plus a forth handling the curl_multi_get but I'm not sharing that addon right now.
What's included in elHttpClient?
It includes 3 classes and a bunch of one-line helpers. It is a PHP5+ only as it now uses private functions for internal needs. You need to clean things yourself to use in older PHP versions. I will detail three classes below that play together to do magic.
elHttpHeaders
This handles HTTP headers. It is used for handling request headers and response headers. HTTP Headers are lines following this format: Name: Value[CRLF]. This class parses them into a user friendly array that can be accessed using simple functions.
elHttpResponse
This contains the response of a request: HTTP Request + Response + Body. It has the request headers, response headers, curl info and raw body + http status code.
elHttpClient
This is the main class that performs the requests. Each request will return a elHttpResponse. This is one major change from the old version. Before I used to return HTML (Raw Content) and accessing headers was more difficult but now I return an elHttpResponse covering everything.
A photo means a thousand words, so does an example
I tried to write a bunch of examples to cover most of the ways you would use these classes. I'm pretty sure most will figure out how to use it from these examples. The rest, God bless!
This is a taste of the elHttpHeaders class.
<?php
//---------------------------------------------------------
header('Content-Type: text/plain');
$httpHeaders = new elHttpHeaders();
// Import HTTP headers
$httpHeaders->fromHeaders("User-Agent: MSIE\r\nReferer: http://");
// Add elements
$httpHeaders->set('Content-Type', 'text/plain');
$httpHeaders->set('Content-Length', 1024);
$httpHeaders->set('Cookie', 'cookie1=cookie1_value');
$httpHeaders->set('Cookie', 'cookie2=cookie2_value');
echo $httpHeaders;
echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n";
// Delete elements with name and defined value
$httpHeaders->del('Cookie','cookie1=cookie1_value');
echo $httpHeaders;
echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n";
// Delete all elements with name
$httpHeaders->del('Cookie');
echo $httpHeaders;
echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n";
// Add Cookie back
$httpHeaders->set('Cookie', 'cookie1=cookie1_value');
$httpHeaders->set('Cookie', 'cookie2=cookie2_value');
// Export as query string
$asqs = $httpHeaders->toQueryString();
echo $asqs;
echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n";
$httpHeaders->clean(); // Now it's empty!
echo $httpHeaders;
echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n";
// Import query string
$httpHeaders->fromQueryString($asqs);
echo $httpHeaders;
echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n";
die('elHttpHeaders Magic - by 5ubliminal');
//---------------------------------------------------------
?>
And this is how you use the elHttpClient with a simple GET!
<?php
//---------------------------------------------------------
header('Content-Type: text/plain');
$httpClient = new elHttpClient();
$httpClient->setUserAgent('ff3');
$httpContainer = $httpClient->get('http://www.yahoo.com/',
array(
'Referer' => 'http://www.yahoo.com/',
'X-User-Agent' => 'elHttpClient by 5ubliminal',
)
);
echo "Raw Response Data Length: ".strlen($httpContainer->httpBody)."\r\n\r\n";
echo "HTTP Status Code: ".$httpContainer->httpStatus."\r\n\r\n";
// The URL you ask for
echo "Requested URL: ".$httpContainer->requestURL."\r\n\r\n";
// The URL you get if redirects are enabled
echo "Effective URL: ".$httpContainer->responseURL."\r\n\r\n";
$httpRequest = $httpContainer->getRequest(); // elHttpHeaders
$httpResponse = $httpContainer->getResponse(); // elHttpHeaders
$httpCookies = $httpContainer->getCookies(); // elHttpHeaders
echo "HTTP Request:\r\n"; echo($httpRequest); echo "\r\n\r\n";
echo "HTTP Response:\r\n"; echo($httpResponse); echo "\r\n\r\n";
echo "HTTP Cookies:\r\n"; echo($httpCookies); echo "\r\n\r\n";
print_r($httpContainer->httpInfo);
die('elHttpClient Magic - by 5ubliminal');
//---------------------------------------------------------
?>
I want it!
It is free and comes as is with no guarantees but I do 'appreciate monetary appreciation' or linkbacks. Sure as hell cheaper and higher quality than RentACoder :) You are not allowed to publish it to other pages. It can be included in downloadable files but not publicly placed on pages. Copyright notice must stay intact. Get it below!
Registration is FREE, quick, painless and worth its weight in gold.
Help! F1 F1 F1
For those who 'don't get it', comment form is below. These classes are made FPBP (for professionals by professionals). Don't ask for help if you're just learning PHP and can't figure this out ... at all. It's like that! Books first ... my shit later.
If you find bugs lemme know their whereabouts or behavioral patterns and I'll squash them into oblivion. Subscribe to the comments feed for this post as every time I change/add anything I'll post a comment. I don't to email notifications. RSS Feeds own in their own special way.
One thing I beg of you. There will be a bunch of comments here. Keep it clean! Use the reply button to follow conversation. If you post a comment I will reply to it and you will also reply to it if your new comment is related to it! Not to my reply and not a new comment unless you say something new.
Comments that don't show decency and follow guidelines related to their positioning will be discarded furiously!
Updates:
- 07.02.2009 : Fixed HEAD, POST bug, added more functions to elHttpResponse and changed some variables to private.


I loved the original eHttpClient, mostly for it’s simplicity in functions like get() and post(). The new one more functions, but it seems like it stayed true to the basics.
Thanks for sharing!
Indeed. Same ease of use yet more options easier to access.
It’s much more simple to access headers now then before.
Thanks for sharing this - I was wondering whether (hoping) you’d be making your code available again…
That is pretty freaking sweet. How about one that just returns the part of a site? So that tag, title and description scraping is easier?
Communibus
You can’t return just part of the site. You get the HTML and then RegExp preg_match the desired sections.
You could use HTTP Byte-Aranges and get just the first KB of the page as 99% this will include title and metas.
I’ll cover this soon.
My comment didn’t come through correctly, I put the “meta” tag in by mistake.
What I was saying was, it would be cool to have a class that just sucks the “HEAD” section of a site and returns the “META” information like you did here.
I basically have been sucking about 2000-4000 bytes of information using a home grown function and was more curious then anything. You do seem ahead of me in the programming department…
p.s. I hate RegExp and I f**king HATE RegExp when I have to match TAGS.
I’ll have a post tomorrow if I got time to show you (and others) how to scrape METAs + TITLE with minimum damage.
PS: Not many sites accept byte ranges.
Good stuff 5ubliminal.
I’ve been a loyal user of eHttpClient for almost a year now, and it has never failed me. :)
Major changes in elHttpClient. Get your new copy!
Fixed a HEAD and POST bug, added a few more functions, changed some variables to private.
In a word simplified a lot of things especially in elHttpResponse.
If you only want to get headers using cURL you can just set a really low timeout once you’ve connected to the site, e.g. curl_setopt(CURLOPT_TIMEOUT, 2), which only starts *once you’ve connected*. If you use curl_setopt(CURLOPT_CUSTOMREQUEST, ‘HEAD’) in most cases the server will give you the wrong headers (I was trying to find the end URL off a set of redirects).
I see the following error.
“Supplied argument is not a valid resource handle” on line 204 207 118
This is not an error but a warning.
Use error_reporting(E_ALL^E_WARNING^E_NOTICE) to get rid of it and learn more on PHP error reporting.
PS: I’m tired of getting warning complaints. If you see warning written before your error, don’t tell me.
it looks nice . but how i can access these classes . can you send me
IT says click here to download and it’s red. If you don’t see it … :)
there is no download… really
There’s a red rectangle below I want it title. I’m not gonna post the link here.
If you can’t find it … I’m sure you can’t use it. I checked in both Firefox and Opera logged in and not logged in.
I think you’re making fun of me :)
I’m not making fun of you at all.
the rectangle appeared but after I posted the ‘there is no download..’ comment.
So you found it? To see it, you needed to be logged in and to refresh the page.
PS: Ma scuzi pt. ton dar ma mai bazai unul ca nu gaseste linkul de download. Este exasperant.
I had to change the variables from private to public to get it to work on lines 111-113. Otherwise I got the following error message:
Fatal error: Cannot access private property elHttpResponse::$httpStatus
Am I missing something here?
Use ->getStatus() instead.
Ah, I see… Makes sense. Thanks 5ub.
I use many privates to make them readonly. And functions expose them.
Anyone try authenticating through Shibboleth protected logons with cURL?
Get HTTPFox, intercept, analyze the calls and replicate. As long as there’s no JS involved or CAPTCHA, you should be able to do it.
That’s great, thanks for the link. I like HTTPFox better than the Live HTTP Headers plugin because it’s easier to track what’s going on.
Thanks for sharing.
Great job.
The most important part for me to modify is cookie handling. I need to run multiple crawlers by cron jobs, so cookies are stored in the DB for every crawler “session”. I think at least a deleteCookies() method - besides of disableCookies() - will be useful, then cookies can be fetched from DB and set.
It’s a pity you’re not publishing your free code under GPL :-/ but that’s your choice… Thanks for this nice piece of code though!
You can disable cookies by commenting out the cookie jar file config in the __construct somewhere.
This way cookies will not be saved anymore and you’ll need to handle them manually.
Totally easy.
PS: I’m not publishing my code under the Baphomet licenses. Sorry. I don’t dig horns.
I like your arrogant prick style. Of course I can comment out whatever I want and I’m able do what I was talking about. This was just a suggestion.
Dig whatever you like ;) Peace!
Thanks :) I always make a good first impression. I fix things afterwards.
I missed diplomacy lessons in life. Time is short, less words … more meaning.
httpclient.zip was not found in /wp-files/1/biggies.
You’re right … sorry … renamed some folders today and messed it.
It should work now.
thanks 4 fixing it
I’m still showing it as not being found.
httpclient.zip was not found in /wp-files/jchiojmo/1/biggies.
I fixed it and I EVEN CHECKED it … again. It works now.
I’ve moved the blog database and as I use my own custom uploader few things got lost on the way. Pffff…
Why do I get the following warnings/errors:
Warning: get_class() expects parameter 1 to be object, null given in C:\wamp\www\httpclient.php on line 203
Warning: get_class() expects parameter 1 to be object, null given in C:\wamp\www\httpclient.php on line 205
Warning: get_resource_type() expects parameter 1 to be resource, null given in C:\wamp\www\httpclient.php on line 207
Warning: get_resource_type() expects parameter 1 to be resource, null given in C:\wamp\www\httpclient.php on line 210
Warning: get_class() expects parameter 1 to be object, array given in C:\wamp\www\httpclient.php on line 273
Use error_reporting to disable warnings. It’s so easy ;)
Is it possible to download files (images and other stuff) with that library?
Yes but be aware: Download is done in memory. Don’t exceed several MB.
Can you provide a simple demonstration?
If you can not understand / use second code example don’t try to use elHttpClient. It’s rather advanced.
After registration steps, no email received at all :-(
I’ve changed your password to your username.
Login and change it to something u’ll remeber.
hello, my english is not so good but I give my best :-)
Can you send me the old version of eHTTPClient? I would like the script which is available at http://www.tellinya.com/read/2007/09/07/100.html to try.
Sorry but I no longer have it. I don’t keep old versions. Bad habbit but I only look to the future…
bad ass!!!
Any examples on how to you use a proxy server with ehttp?
function setProxy($proxyHost, $proxyPort, $authUser = null, $authPass = null) …
Got it..sorry I asked that when I was at work and just was being lazy.
Worry not … I’m lazy too!
hello .. am I doing something wrong? I’m trying the following:
$httpClient->setOpt(’CURLOPT_INTERFACE’, ‘1.1.1.1′);
but it doesn’t seem to have any effect. Do you have any ideas?
Have no idea. Google it. Others seem to have problems but I never used this feature.
Hello,
can you pls tell how can i find the rank of my webiste for a paticular keyword.
i am not finding ehttpClient class.
Err… is it true that you have to change the following:
echo “HTTP Status Code: “.$httpContainer->httpStatus.”\r\n\r\n”;
into:
echo “HTTP Status Code: “.$httpContainer->getStatus().”\r\n\r\n”;
and:
print_r($httpContainer->httpInfo);
into:
print_r($httpContainer->getInfo());
Yes. Or just make those properties public again in the class.
heh, sorry, as mentioned - I’m not a coder. turns out I was just calling it wrong, the following works:
httpClient->setOpt(CURLOPT_INTERFACE, “1.1.1.1″)
thank you so much!
That is how you’re supposed to call it :)
[...] pingback url for me … I stumbled uppon: http://www.tellinya.com/read/2007/10/02/173.html … And I altered the findXmlRpc function a bit. What it does is that it will read the headers [...]