If you’re a Linux
user then you’ve probably used cURL. It’s a powerful tool used from posting
mails to downloading the latest My Little Pony subtitles. In this article I’ll
explain how to use the cURL extension in PHP. The extension offers us the
functionality as the console utility in the comfortable world of PHP. I’ll
discuss sending GET and POST requests, handling login cookies, and FTP
functionality.
Before we begin, make
sure you have the extension (and the libcURL library) installed. It’s not
installed by default. In most cases it can be installed using your system’s
package manager, but barring that you can find instructions in the PHP manual.
How Does it Work?
All cURL requests
follow the same basic pattern:
1. First we initialize
the cURL resource (often abbreviated as ch for “cURL handle”)
by calling the curl_init()function.
2. Next we set various
options, such as the URL, request method, payload data, etc. Options can be set
individually with curl_setopt(), or we can pass an
array of options to curl_setopt_array().
3. Then we execute the
request by calling curl_exec().
4. Finally, we free the
resource to clear out memory.
So, the boilerplate
code for making a request looks something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
<?php
// init the resource
$ch = curl_init();
// set a single option...
curl_setopt($ch, OPTION, $value);
// ... or an array of options
curl_setopt_array($ch, array(
OPTION1 => $value1,
OPTION2 => $value2
));
// execute
$output = curl_exec($ch);
// free
curl_close($ch);
|
The only thing that
changes for the request is what options are set, which of course depends on
what you’re doing with cURL.
Retrieve a Web Page
The most basic
example of using cURL that I can think of is simply fetching the contents of a
web page. So, let’s fetch the homepage of the BBC as an example.
1
2
3
4
5
6
7
8
9
|
<?php
curl_setopt_array(
$ch, array(
CURLOPT_RETURNTRANSFER => true
));
$output = curl_exec($ch);
echo $output;
|
Check the output in
your browser and you should see the BBC website displayed. We’re lucky as the
site displays correctly because of its absolute linking to stylesheets and
images.
The options we just
used were:
·
CURLOPT_URL – specifies the
URL for the request
·
CURLOPT_RETURNTRANSFER – when set
false, curl_exec() returns true or
false depending on the success of the request. When set to true, curl_exec() returns the contents of the
response.
Log in to a Website
cURL executed a GET
request to retrieve the BBC page, but cURL can also use other methods, such as
POST and PUT. For this example, let’s simulate logging into a WordPress-powered
website. Logging in is done by sending a POST request to http://example.com/wp-login.php with
the following details:
·
login – the username
·
pwd – the password
·
redirect_to – the URL we
want to go to after logging in
·
testcookie – should be set
to 1 (this is just for WordPress)
Of course these
parameters are specific to each site. You should always check the input names
for yourself, something that can easily be done by viewing the source of an
HTML page in your browser.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
<?php
$postData = array(
'login' => 'acogneau',
'pwd' => 'secretpassword',
'testcookie' => '1'
);
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $postData,
CURLOPT_FOLLOWLOCATION => true
));
$output = curl_exec($ch);
echo $output;
|
The new options are:
·
CURLOPT_POST – set this true
if you want to send a POST request
·
CURLOPT_POSTFIELDS – the data that
will be sent in the body of the request
·
CURLOPT_FOLLOWLOCATION – if set true,
cURL will follow redirects
Uh oh! If you test
the above however you’ll see an error message: “ERROR: Cookies are blocked or
not supported by your browser. You must enable cookies to use
WordPress.” This is normal, because we need to have cookies enabled for
sessions to work. We do this by adding two more options.
1
2
3
4
5
6
7
8
9
10
|
<?php
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $postData,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_COOKIESESSION => true,
CUROPT_COOKIEJAR => 'cookie.txt'
));
|
The new options are:
·
CURLOPT_COOKIESESSION – if set to
true, cURL will start a new cookie session and ignore any previous cookies
·
CURLOPT_COOKIEJAR – this is the
name of the file where cURL should save cookie information. Make sure you have
the correct permissions to write to the file!
Now that we’re logged
in, we only need to reference the cookie file for subsequent requests.
Working with FTP
Using cURL to
download and upload files via FTP is easy as well. Let’s look at downloading a
file:
1
2
3
4
5
6
7
8
9
|
<?php
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERPWD => 'username:password'
));
$output = curl_exec($ch);
echo $output;
|
Note that there
aren’t many public FTP servers that allow anonymous uploads and downloads for
security reasons, so the URL and credentials above are just place-holders.
This is almost the
same as sending an HTTP request, but only a couple minor differences:
·
CURLOPT_URL – the URL of
the file, note the use of “ftp://” instead of “http://”
·
CURLOT_USERPWD – the login
credentials for the FTP server
Uploading a file via
FTP is slightly more complex, but still managable. It looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
<?php
$fp = fopen('test.txt',
'r');
curl_setopt_array($ch, array(
CURLOPT_USERPWD => 'username:password'
CURLOPT_UPLOAD => true,
CURLOPT_INFILE => $fp,
CURLOPT_INFILESIZE =>
filesize('test.txt')
));
curl_exec($ch);
fclose($fp);
curl_close($ch);
|
The important options
here are:
·
CURLOPT_UPLOAD – obvious
boolean
·
CURLOPT_INFILE – a readable
stream for the file we want to upload
·
CURLOPT_INFILESIZE – the size of
the file we want to upload in bytes
Sending Multiple Requests
Imagine we have to
perform five requests to retrieve all of the necessary data. Keep in mind that
some things will be beyond our control, such as network latency and the
response speed of the target servers. It should be obvious then that any delays
when issuing five consecutive calls can really add up! One way to mitigate this
problem is to issue the requests asynchronously.
Asynchronous
techniques are more common in the JavaScript and Node.js communities, but
briefly instead of waiting for a time-consuming task to complete, we assign the
task to a different thread or process and continue to do other things in the
meantime. When the task is complete we come back for its result. The important
thing is that we haven’t wasted time waiting for a result; we spent it
executing other code independently.
The approach for
performing multiple asynchronous cURL requests is a bit different from before.
We start out the same – we initiate each channel and then set the options – but
then we initiate a multihandler using curl_multi_init() and add our channels to it
with curl_multi_add_handle(). We execute the
handlers by looping through them and checking their status. In the end we get a
response’s content withcurl_multi_getcontent().
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
<?php
// URLs we want to retrieve
$urls = array(
);
// initialize the multihandler
$mh = curl_multi_init();
$channels = array();
foreach ($urls as $key => $url) {
// initiate individual channel
$channels[$key] = curl_init();
curl_setopt_array($channels[$key], array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true
));
// add channel to multihandler
curl_multi_add_handle($mh, $channels[$key]);
}
// execute - if there is an active
connection then keep looping
$active = null;
do {
$status =
curl_multi_exec($mh, $active);
}
while ($active && $status == CURLM_OK);
// echo the content, remove the handlers,
then close them
foreach ($channels as $chan) {
echo curl_multi_getcontent($chan);
curl_multi_remove_handle($mh, $chan);
curl_close($chan);
}
// close the multihandler
curl_multi_close($mh);
|
The above code took
around 1,100 ms to execute on my laptop. Performing the requests sequentially
without the multi interface it took around 2,000 ms. Imagine what your gain
will be if you are sending hundreds of requests!
Multiple projects
exist that abstract and wrap the multi interface. Discussing them is beyond the
scope of the article, but if you’re planning to issue multiple requests
asynchronously then I recommend you take a look at them:
Troubleshooting
If you’re using cURL
then you are probably performing your requests to third-party servers. You
can’t control them and much can go wrong: servers can go offline, directory
structures can change, etc. We need an efficient way to find out what’s wrong
when something doesn’t work, and luckily cURL offers two functions for this:curl_getinfo() and curl_error().
curl_getinfo() returns an array with all of the
information regarding the channel, so if you want to check if everything is all
right you can use:
1
2
|
<?php
var_dump(curl_getinfo($ch));
|
If an error pops up,
you can check it out with curl_error():
1
2
3
4
5
6
7
8
|
<?php
if (!curl_exec($ch)) {
// if curl_exec() returned false and thus
failed
echo 'An error has occurred:
' . curl_error($ch);
}
else {
echo 'everything was
successful';
}
|
Conclusion
cURL
offers a powerful and efficient way to make remote calls, so if you’re ever in
need of a crawler or something to access an external API, cURL is a great tool
for the job. It provides us an nice interface and a relatively easy way to
execute requests.