Tuesday, May 27, 2014

Using CURL for Remote Requests

If you’re a Linux user then you’ve probably used cURL. It’s a powerful tool used from posting mails to downloading the latest My Little Pony subtitles. In this article I’ll explain how to use the cURL extension in PHP. The extension offers us the functionality as the console utility in the comfortable world of PHP. I’ll discuss sending GET and POST requests, handling login cookies, and FTP functionality.
Before we begin, make sure you have the extension (and the libcURL library) installed. It’s not installed by default. In most cases it can be installed using your system’s package manager, but barring that you can find instructions in the PHP manual.
How Does it Work?
All cURL requests follow the same basic pattern:
1.      First we initialize the cURL resource (often abbreviated as ch for “cURL handle”) by calling the curl_init()function.
2.      Next we set various options, such as the URL, request method, payload data, etc. Options can be set individually with curl_setopt(), or we can pass an array of options to curl_setopt_array().
3.      Then we execute the request by calling curl_exec().
4.      Finally, we free the resource to clear out memory.
So, the boilerplate code for making a request looks something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<?php
// init the resource
$ch = curl_init();

// set a single option...
curl_setopt($ch, OPTION, $value);
// ... or an array of options
curl_setopt_array($ch, array(
    OPTION1 => $value1,
    OPTION2 => $value2
));

// execute
$output = curl_exec($ch);

// free
curl_close($ch);
The only thing that changes for the request is what options are set, which of course depends on what you’re doing with cURL.
Retrieve a Web Page
The most basic example of using cURL that I can think of is simply fetching the contents of a web page. So, let’s fetch the homepage of the BBC as an example.
1
2
3
4
5
6
7
8
9
<?php
curl_setopt_array(
    $ch, array(
    CURLOPT_URL => 'http://www.bbc.co.uk/',
    CURLOPT_RETURNTRANSFER => true
));

$output = curl_exec($ch);
echo $output;
Check the output in your browser and you should see the BBC website displayed. We’re lucky as the site displays correctly because of its absolute linking to stylesheets and images.
The options we just used were:
·         CURLOPT_URL – specifies the URL for the request
·         CURLOPT_RETURNTRANSFER – when set false, curl_exec() returns true or false depending on the success of the request. When set to true, curl_exec() returns the contents of the response.
Log in to a Website
cURL executed a GET request to retrieve the BBC page, but cURL can also use other methods, such as POST and PUT. For this example, let’s simulate logging into a WordPress-powered website. Logging in is done by sending a POST request to http://example.com/wp-login.php with the following details:
·         login – the username
·         pwd – the password
·         redirect_to – the URL we want to go to after logging in
·         testcookie – should be set to 1 (this is just for WordPress)
Of course these parameters are specific to each site. You should always check the input names for yourself, something that can easily be done by viewing the source of an HTML page in your browser.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?php
$postData = array(
    'login' => 'acogneau',
    'pwd' => 'secretpassword',
    'redirect_to' => 'http://example.com',
    'testcookie' => '1'
);

curl_setopt_array($ch, array(
    CURLOPT_URL => 'http://example.com/wp-login.php',
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => $postData,
    CURLOPT_FOLLOWLOCATION => true
));

$output = curl_exec($ch);
echo $output;
The new options are:
·         CURLOPT_POST – set this true if you want to send a POST request
·         CURLOPT_POSTFIELDS – the data that will be sent in the body of the request
·         CURLOPT_FOLLOWLOCATION – if set true, cURL will follow redirects
Uh oh! If you test the above however you’ll see an error message: “ERROR: Cookies are blocked or not supported by your browser. You must enable cookies to use WordPress.” This is normal, because we need to have cookies enabled for sessions to work. We do this by adding two more options.
1
2
3
4
5
6
7
8
9
10
<?php
curl_setopt_array($ch, array(
    CURLOPT_URL => 'http://example.com/wp-login.php',
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => $postData,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_COOKIESESSION => true,
    CUROPT_COOKIEJAR => 'cookie.txt'
));
The new options are:
·         CURLOPT_COOKIESESSION – if set to true, cURL will start a new cookie session and ignore any previous cookies
·         CURLOPT_COOKIEJAR – this is the name of the file where cURL should save cookie information. Make sure you have the correct permissions to write to the file!
Now that we’re logged in, we only need to reference the cookie file for subsequent requests.
Working with FTP
Using cURL to download and upload files via FTP is easy as well. Let’s look at downloading a file:
1
2
3
4
5
6
7
8
9
<?php
curl_setopt_array($ch, array(
    CURLOPT_URL => 'ftp://ftp.example.com/test.txt',
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_USERPWD => 'username:password'
));

$output = curl_exec($ch);
echo $output;
Note that there aren’t many public FTP servers that allow anonymous uploads and downloads for security reasons, so the URL and credentials above are just place-holders.
This is almost the same as sending an HTTP request, but only a couple minor differences:
·         CURLOPT_URL – the URL of the file, note the use of “ftp://” instead of “http://”
·         CURLOT_USERPWD – the login credentials for the FTP server
Uploading a file via FTP is slightly more complex, but still managable. It looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
<?php
$fp = fopen('test.txt', 'r');
curl_setopt_array($ch, array(
    CURLOPT_URL => 'ftp://ftp.example.com/test.txt',
    CURLOPT_USERPWD => 'username:password'
    CURLOPT_UPLOAD => true,
    CURLOPT_INFILE => $fp,
    CURLOPT_INFILESIZE => filesize('test.txt')
));
curl_exec($ch);

fclose($fp);
curl_close($ch);
The important options here are:
·         CURLOPT_UPLOAD – obvious boolean
·         CURLOPT_INFILE – a readable stream for the file we want to upload
·         CURLOPT_INFILESIZE – the size of the file we want to upload in bytes
Sending Multiple Requests
Imagine we have to perform five requests to retrieve all of the necessary data. Keep in mind that some things will be beyond our control, such as network latency and the response speed of the target servers. It should be obvious then that any delays when issuing five consecutive calls can really add up! One way to mitigate this problem is to issue the requests asynchronously.
Asynchronous techniques are more common in the JavaScript and Node.js communities, but briefly instead of waiting for a time-consuming task to complete, we assign the task to a different thread or process and continue to do other things in the meantime. When the task is complete we come back for its result. The important thing is that we haven’t wasted time waiting for a result; we spent it executing other code independently.
The approach for performing multiple asynchronous cURL requests is a bit different from before. We start out the same – we initiate each channel and then set the options – but then we initiate a multihandler using curl_multi_init() and add our channels to it with curl_multi_add_handle(). We execute the handlers by looping through them and checking their status. In the end we get a response’s content withcurl_multi_getcontent().
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
<?php
// URLs we want to retrieve
$urls = array(
    'http://www.bing.com',
);

// initialize the multihandler
$mh = curl_multi_init();

$channels = array();
foreach ($urls as $key => $url) {
    // initiate individual channel
    $channels[$key] = curl_init();
    curl_setopt_array($channels[$key], array(
        CURLOPT_URL => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => true
    ));

    // add channel to multihandler
    curl_multi_add_handle($mh, $channels[$key]);
}

// execute - if there is an active connection then keep looping
$active = null;
do {
    $status = curl_multi_exec($mh, $active);
}
while ($active && $status == CURLM_OK);

// echo the content, remove the handlers, then close them
foreach ($channels as $chan) {
    echo curl_multi_getcontent($chan);
    curl_multi_remove_handle($mh, $chan);
    curl_close($chan);
}

// close the multihandler
curl_multi_close($mh);
The above code took around 1,100 ms to execute on my laptop. Performing the requests sequentially without the multi interface it took around 2,000 ms. Imagine what your gain will be if you are sending hundreds of requests!
Multiple projects exist that abstract and wrap the multi interface. Discussing them is beyond the scope of the article, but if you’re planning to issue multiple requests asynchronously then I recommend you take a look at them:
Troubleshooting
If you’re using cURL then you are probably performing your requests to third-party servers. You can’t control them and much can go wrong: servers can go offline, directory structures can change, etc. We need an efficient way to find out what’s wrong when something doesn’t work, and luckily cURL offers two functions for this:curl_getinfo() and curl_error().
curl_getinfo() returns an array with all of the information regarding the channel, so if you want to check if everything is all right you can use:
1
2
<?php
var_dump(curl_getinfo($ch));
If an error pops up, you can check it out with curl_error():
1
2
3
4
5
6
7
8
<?php
if (!curl_exec($ch)) {
    // if curl_exec() returned false and thus failed
    echo 'An error has occurred: ' . curl_error($ch);
}
else {
    echo 'everything was successful';
}
Conclusion
cURL offers a powerful and efficient way to make remote calls, so if you’re ever in need of a crawler or something to access an external API, cURL is a great tool for the job. It provides us an nice interface and a relatively easy way to execute requests. 

No comments:

Post a Comment

Please Comment Here!

How to backup and download Database using PHP

< ?php $mysqlUserName = 'databaseusername' ; $mysqlPassword = 'databasepassword' ; $mysqlHostNa...