|
|
|
| LWP Quick Reference Guide | |
The simplest way is from the shell using the GET command, which is typically installed in the same directory as perl. Next easiest is to use LWP::Simple. For the most features use LWP::UserAgent, or LWP::RobotUA. It is a very good idea to put delays in your programs so that you do not overwhelm a web server with requests.
The simplest way is to just use the shell command:
GET "url"
Don't forget the quotes, they protect the shell.
If you are behind a firewall and you have a proxy, use:
GET -p"proxyurl" "url"
LWP::Simple may be easier than the rest of the LWP examples on this page.
#!/usr/bin/perl
use strict;
use LWP::UserAgent;
my $content;
my $ua = new LWP::UserAgent;
# Various enhancement possibilities:
# $ua->max_size(100000); # 100k byte limit
# $ua->timeout(3); # 3 sec timeout is default
# $ua->proxy(['http'], 'http://myproxy.mycorp.com/'); # set proxy
# $ua->env_proxy() # load proxy info from environment variables
# $ua->no_proxy('localhost', 'mycorp.com'); # No proxy for local machines
$ua->agent("Mozilla/6.0"); # Or something equally mysterious
my $req = new HTTP::Request GET => 'http://mycorp.com/';
my $res = $ua->request($req);
if ($res->is_success)
{
$content= $res->content;
}
else
{
die "Could not get content";
}
#!/usr/bin/perl
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;
use Getopt::Std;
use strict;
my ($url, $opt_u);
getopts("u:");
if ($opt_u)
{
$url= $opt_u
}
else
{
$url = "http://mycorp.com/";
}
my $ua = new LWP::UserAgent;
# Set up a callback that collect links
my @links = ();
my @abs_links;
sub callback
{
my($tag, %attr) = @_;
return if $tag ne 'a'; # we only look closer at
return if $attr{href} =~ /^mailto:/; # ignore mailto:
push(@links, values %attr);
}
# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
my $p = HTML::LinkExtor->new(\&callback);
# Request document and parse it as it arrives
my $res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});
# Expand all URLs to absolute ones
my $base = $res->base;
@links = map { $_ = url($_, $base)->abs; } @links;
for(@links)
{
$_ = url($_, $base)->abs;
if (/$url/)
{
push @abs_links, $_;
}
# don't go outside this site
}
# Print them out
print join("\n", @abs_links), "\n";
require LWP::RobotUA; $ua = new LWP::RobotUA 'my-robot/0.1', 'me@foo.com'; $ua->delay(10); # be very nice, go slowly ... # just use it just like a normal LWP::UserAgent $res = $ua->request($req);
HTTP Error codes are found in the module HTTP::Status.
LWP::MemberMixin -- Access to member variables of Perl5 classes
LWP::UserAgent -- WWW user agent class
LWP::RobotUA -- When developing a robot applications
LWP::Protocol -- Interface to various protocol schemes
LWP::Protocol::http -- http:// access
LWP::Protocol::file -- file:// access
LWP::Protocol::ftp -- ftp:// access
...
LWP::Authen::Basic -- Handle 401 and 407 responses
LWP::Authen::Digest
HTTP::Headers -- MIME/RFC822 style header (used by HTTP::Message)
HTTP::Message -- HTTP style message
HTTP::Request -- HTTP request
HTTP::Response -- HTTP response
HTTP::Daemon -- A HTTP server class
WWW::RobotRules -- Parse robots.txt files
WWW::RobotRules::AnyDBM_File -- Persistent RobotRules
The following modules provide various functions and
definitions.
LWP -- This file. Library version number and documentation.
LWP::MediaTypes -- MIME types configuration (text/html etc.)
LWP::Debug -- Debug logging module
LWP::Simple -- Simplified procedural interface for common functions
HTTP::Status -- HTTP status code (200 OK etc)
HTTP::Date -- Date parsing module for HTTP date formats
HTTP::Negotiate -- HTTP content negotiation calculation
File::Listing -- Parse directory listings
HTTP::Request::Common -- Construct common HTTP::Request objects