New here? Read Greetings Earthling!

elHttpClient Evolution - PHP + cURL + HTTP

My mildly wildly popular eHttpClient class no longer available on my old blog has evolved. It changed into elHttpClient, an all new toy for the fellow blackhats and php coders. eHttpClient is the one-stop solution for web downloading combining the power of cURL with PHP to provide a dead easy solution for your web downloading (scraping) needs.With basic knowledge you can do anything in terms of downloading web content.

This class has been by far my most popular share and brought in most feedback. I've looked into the feedback trying to make things even easier and this was born. Oh wtf ... the truth is I've written this for myself and decided to share it! I didn't care about no feedback.

What changed compared to the old one?

Everything. There's virtually no such thing are backwards compatibility. It's a totally different thing employing new techniques to make things easier yet giving more power to the coder. Things can be achieved with fewer lines of code. It's an all OOP-star and the initial class broke into three new classes ... plus a forth handling the curl_multi_get but I'm not sharing that addon right now.

What's included in elHttpClient?

It includes 3 classes and a bunch of one-line helpers. It is a PHP5+ only as it now uses private functions for internal needs. You need to clean things yourself to use in older PHP versions. I will detail three classes below that play together to do magic.

elHttpHeaders

This handles HTTP headers. It is used for handling request headers and response headers. HTTP Headers are lines following this format: Name: Value[CRLF]. This class parses them into a user friendly array that can be accessed using simple functions.

elHttpResponse

This contains the response of a request: HTTP Request + Response + Body. It has the request headers, response headers, curl info and raw body + http status code.

elHttpClient

This is the main class that performs the requests. Each request will return a elHttpResponse. This is one major change from the old version. Before I used to return HTML (Raw Content) and accessing headers was more difficult but now I return an elHttpResponse covering everything.

A photo means a thousand words, so does an example

I tried to write a bunch of examples to cover most of the ways you would use these classes. I'm pretty sure most will figure out how to use it from these examples. The rest, God bless!

This is a taste of the elHttpHeaders class.

<?php //--------------------------------------------------------- header('Content-Type: text/plain'); $httpHeaders = new elHttpHeaders(); // Import HTTP headers $httpHeaders->fromHeaders("User-Agent: MSIE\r\nReferer: http://"); // Add elements $httpHeaders->set('Content-Type', 'text/plain'); $httpHeaders->set('Content-Length', 1024); $httpHeaders->set('Cookie', 'cookie1=cookie1_value'); $httpHeaders->set('Cookie', 'cookie2=cookie2_value'); echo $httpHeaders; echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n"; // Delete elements with name and defined value $httpHeaders->del('Cookie','cookie1=cookie1_value'); echo $httpHeaders; echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n"; // Delete all elements with name $httpHeaders->del('Cookie'); echo $httpHeaders; echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n"; // Add Cookie back $httpHeaders->set('Cookie', 'cookie1=cookie1_value'); $httpHeaders->set('Cookie', 'cookie2=cookie2_value'); // Export as query string $asqs = $httpHeaders->toQueryString(); echo $asqs; echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n"; $httpHeaders->clean(); // Now it's empty! echo $httpHeaders; echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n"; // Import query string $httpHeaders->fromQueryString($asqs); echo $httpHeaders; echo "\r\n-----".basename(__FILE__).':'.__LINE__."-----\r\n"; die('elHttpHeaders Magic - by 5ubliminal'); //--------------------------------------------------------- ?>

And this is how you use the elHttpClient with a simple GET!

<?php //--------------------------------------------------------- header('Content-Type: text/plain'); $httpClient = new elHttpClient(); $httpClient->setUserAgent('ff3'); $httpContainer = $httpClient->get('http://www.yahoo.com/', array( 'Referer' => 'http://www.yahoo.com/', 'X-User-Agent' => 'elHttpClient by 5ubliminal', ) ); echo "Raw Response Data Length: ".strlen($httpContainer->httpBody)."\r\n\r\n"; echo "HTTP Status Code: ".$httpContainer->httpStatus."\r\n\r\n"; // The URL you ask for echo "Requested URL: ".$httpContainer->requestURL."\r\n\r\n"; // The URL you get if redirects are enabled echo "Effective URL: ".$httpContainer->responseURL."\r\n\r\n"; $httpRequest = $httpContainer->getRequest(); // elHttpHeaders $httpResponse = $httpContainer->getResponse(); // elHttpHeaders $httpCookies = $httpContainer->getCookies(); // elHttpHeaders echo "HTTP Request:\r\n"; echo($httpRequest); echo "\r\n\r\n"; echo "HTTP Response:\r\n"; echo($httpResponse); echo "\r\n\r\n"; echo "HTTP Cookies:\r\n"; echo($httpCookies); echo "\r\n\r\n"; print_r($httpContainer->httpInfo); die('elHttpClient Magic - by 5ubliminal'); //--------------------------------------------------------- ?>

I want it!

It is free and comes as is with no guarantees but I do 'appreciate monetary appreciation' or linkbacks. Sure as hell cheaper and higher quality than RentACoder :) You are not allowed to publish it to other pages. It can be included in downloadable files but not publicly placed on pages. Copyright notice must stay intact. Get it below!

Zone unavailable to unregistered users.
Registration is FREE, quick, painless and worth its weight in gold.

Help! F1 F1 F1

For those  who 'don't get it', comment form is below. These classes are made FPBP (for professionals by professionals). Don't ask for help if you're just learning PHP and can't figure this out ... at all. It's like that! Books first ... my shit later.

If you find bugs lemme know their whereabouts or behavioral patterns and I'll squash them into oblivion. Subscribe to the comments feed for this post as every time I change/add anything I'll post a comment. I don't to email notifications. RSS Feeds own in their own special way.

One thing I beg of you. There will be a bunch of comments here. Keep it clean! Use the reply button to follow conversation. If you post a comment I will reply to it and you will also reply to it if your new comment is related to it! Not to my reply and not a new comment unless you say something new.

Comments that don't show decency and follow guidelines related to their positioning will be discarded furiously!

Updates:

  • 07.02.2009 : Fixed HEAD, POST bug, added more functions to elHttpResponse and changed some variables to private.

Category: PHP
Tagged: , , , ,

58 Responses

  1. I loved the original eHttpClient, mostly for it’s simplicity in functions like get() and post(). The new one more functions, but it seems like it stayed true to the basics.

    Thanks for sharing!

    • $@5ubliminal24:357 — #1 says:

      Indeed. Same ease of use yet more options easier to access.
      It’s much more simple to access headers now then before.

  2. +Peter1:1 — #90 says:

    Thanks for sharing this - I was wondering whether (hoping) you’d be making your code available again…

  3. That is pretty freaking sweet. How about one that just returns the part of a site? So that tag, title and description scraping is easier?

    Communibus

    • $@5ubliminal30:357 — #1 says:

      You can’t return just part of the site. You get the HTML and then RegExp preg_match the desired sections.
      You could use HTTP Byte-Aranges and get just the first KB of the page as 99% this will include title and metas.

      I’ll cover this soon.

  4. My comment didn’t come through correctly, I put the “meta” tag in by mistake.

    What I was saying was, it would be cool to have a class that just sucks the “HEAD” section of a site and returns the “META” information like you did here.

    I basically have been sucking about 2000-4000 bytes of information using a home grown function and was more curious then anything. You do seem ahead of me in the programming department…

    p.s. I hate RegExp and I f**king HATE RegExp when I have to match TAGS.

    • $@5ubliminal33:357 — #1 says:

      I’ll have a post tomorrow if I got time to show you (and others) how to scrape METAs + TITLE with minimum damage.

      PS: Not many sites accept byte ranges.

  5. +nogenius1:1 — #92 says:

    Good stuff 5ubliminal.

    I’ve been a loyal user of eHttpClient for almost a year now, and it has never failed me. :)

  6. $@5ubliminal44:357 — #1 says:

    Major changes in elHttpClient. Get your new copy!
    Fixed a HEAD and POST bug, added a few more functions, changed some variables to private.

    In a word simplified a lot of things especially in elHttpResponse.

  7. If you only want to get headers using cURL you can just set a really low timeout once you’ve connected to the site, e.g. curl_setopt(CURLOPT_TIMEOUT, 2), which only starts *once you’ve connected*. If you use curl_setopt(CURLOPT_CUSTOMREQUEST, ‘HEAD’) in most cases the server will give you the wrong headers (I was trying to find the end URL off a set of redirects).

  8. +wyuguy1:1 — #98 says:

    I see the following error.
    “Supplied argument is not a valid resource handle” on line 204 207 118

    • $@5ubliminal53:357 — #1 says:

      This is not an error but a warning.
      Use error_reporting(E_ALL^E_WARNING^E_NOTICE) to get rid of it and learn more on PHP error reporting.

      PS: I’m tired of getting warning complaints. If you see warning written before your error, don’t tell me.

  9. +rizwan_4161:1 — #117 says:

    it looks nice . but how i can access these classes . can you send me

  10. +Mar1:2 — #48 says:

    there is no download… really

    • $@5ubliminal89:357 — #1 says:

      There’s a red rectangle below I want it title. I’m not gonna post the link here.
      If you can’t find it … I’m sure you can’t use it. I checked in both Firefox and Opera logged in and not logged in.

      I think you’re making fun of me :)

  11. +smartmedia2:2 — #48 says:

    I’m not making fun of you at all.
    the rectangle appeared but after I posted the ‘there is no download..’ comment.

    • $@5ubliminal90:357 — #1 says:

      So you found it? To see it, you needed to be logged in and to refresh the page.

      PS: Ma scuzi pt. ton dar ma mai bazai unul ca nu gaseste linkul de download. Este exasperant.

  12. +parksobong4:9 — #12 says:

    I had to change the variables from private to public to get it to work on lines 111-113. Otherwise I got the following error message:

    Fatal error: Cannot access private property elHttpResponse::$httpStatus

    Am I missing something here?

  13. +parksobong6:9 — #12 says:

    Anyone try authenticating through Shibboleth protected logons with cURL?

    • $@5ubliminal96:357 — #1 says:

      Get HTTPFox, intercept, analyze the calls and replicate. As long as there’s no JS involved or CAPTCHA, you should be able to do it.

      • +parksobong7:9 — #12 says:

        That’s great, thanks for the link. I like HTTPFox better than the Live HTTP Headers plugin because it’s easier to track what’s going on.

  14. Thanks for sharing.
    Great job.

  15. +schnizZzla1:2 — #54 says:

    The most important part for me to modify is cookie handling. I need to run multiple crawlers by cron jobs, so cookies are stored in the DB for every crawler “session”. I think at least a deleteCookies() method - besides of disableCookies() - will be useful, then cookies can be fetched from DB and set.

    It’s a pity you’re not publishing your free code under GPL :-/ but that’s your choice… Thanks for this nice piece of code though!

    • $@5ubliminal145:357 — #1 says:

      You can disable cookies by commenting out the cookie jar file config in the __construct somewhere.
      This way cookies will not be saved anymore and you’ll need to handle them manually.

      Totally easy.

      PS: I’m not publishing my code under the Baphomet licenses. Sorry. I don’t dig horns.

      • +schnizZzla2:2 — #54 says:

        I like your arrogant prick style. Of course I can comment out whatever I want and I’m able do what I was talking about. This was just a suggestion.

        Dig whatever you like ;) Peace!

        • $@5ubliminal147:357 — #1 says:

          Thanks :) I always make a good first impression. I fix things afterwards.
          I missed diplomacy lessons in life. Time is short, less words … more meaning.

  16. +anon105001:2 — #56 says:

    httpclient.zip was not found in /wp-files/1/biggies.

    • $@5ubliminal158:357 — #1 says:

      You’re right … sorry … renamed some folders today and messed it.
      It should work now.

      • +anon105002:2 — #56 says:

        thanks 4 fixing it

        • +Poptarts1:1 — #138 says:

          I’m still showing it as not being found.

          httpclient.zip was not found in /wp-files/jchiojmo/1/biggies.

          • $@5ubliminal161:357 — #1 says:

            I fixed it and I EVEN CHECKED it … again. It works now.
            I’ve moved the blog database and as I use my own custom uploader few things got lost on the way. Pffff…

  17. +kundi1:5 — #23 says:

    Why do I get the following warnings/errors:

    Warning: get_class() expects parameter 1 to be object, null given in C:\wamp\www\httpclient.php on line 203

    Warning: get_class() expects parameter 1 to be object, null given in C:\wamp\www\httpclient.php on line 205

    Warning: get_resource_type() expects parameter 1 to be resource, null given in C:\wamp\www\httpclient.php on line 207

    Warning: get_resource_type() expects parameter 1 to be resource, null given in C:\wamp\www\httpclient.php on line 210

    Warning: get_class() expects parameter 1 to be object, array given in C:\wamp\www\httpclient.php on line 273

  18. +kundi4:5 — #23 says:

    Is it possible to download files (images and other stuff) with that library?

  19. +kundi5:5 — #23 says:

    Can you provide a simple demonstration?

  20. After registration steps, no email received at all :-(

  21. +maik1:1 — #172 says:

    hello, my english is not so good but I give my best :-)
    Can you send me the old version of eHTTPClient? I would like the script which is available at http://www.tellinya.com/read/2007/09/07/100.html to try.

  22. +twola5:12 — #6 says:

    bad ass!!!

  23. +twola7:12 — #6 says:

    Any examples on how to you use a proxy server with ehttp?

  24. +koopsta1:2 — #76 says:

    hello .. am I doing something wrong? I’m trying the following:

    $httpClient->setOpt(’CURLOPT_INTERFACE’, ‘1.1.1.1′);

    but it doesn’t seem to have any effect. Do you have any ideas?

  25. +anjali1:1 — #199 says:

    Hello,
    can you pls tell how can i find the rank of my webiste for a paticular keyword.
    i am not finding ehttpClient class.

  26. +raise1:1 — #203 says:

    Err… is it true that you have to change the following:
    echo “HTTP Status Code: “.$httpContainer->httpStatus.”\r\n\r\n”;

    into:
    echo “HTTP Status Code: “.$httpContainer->getStatus().”\r\n\r\n”;

    and:
    print_r($httpContainer->httpInfo);

    into:
    print_r($httpContainer->getInfo());

  27. +koopsta2:2 — #76 says:

    heh, sorry, as mentioned - I’m not a coder. turns out I was just calling it wrong, the following works:

    httpClient->setOpt(CURLOPT_INTERFACE, “1.1.1.1″)

    thank you so much!

  28. That is how you’re supposed to call it :)

  29. [...] pingback url for me … I stumbled uppon: http://www.tellinya.com/read/2007/10/02/173.html … And I altered the findXmlRpc function a bit. What it does is that it will read the headers [...]

Leave a Reply

Comment Links are DoFollow only for registered subscribers … if comments pass moderation. Links in your comments may be slaughtered, hijacked or removed.