![]() | |
![]() |
![]() |
![]() | |
![]() | |
LWP Quick Reference Guide | |
![]() |
The simplest way is from the shell using the GET command, which is typically installed in the same directory as perl. Next easiest is to use LWP::Simple. For the most features use LWP::UserAgent, or LWP::RobotUA. It is a very good idea to put delays in your programs so that you do not overwhelm a web server with requests.
The simplest way is to just use the shell command:
GET "url"
Don't forget the quotes, they protect the shell.
If you are behind a firewall and you have a proxy, use:
GET -p"proxyurl" "url"
LWP::Simple may be easier than the rest of the LWP examples on this page.
#!/usr/bin/perl use strict; use LWP::UserAgent; my $content; my $ua = new LWP::UserAgent; # Various enhancement possibilities: # $ua->max_size(100000); # 100k byte limit # $ua->timeout(3); # 3 sec timeout is default # $ua->proxy(['http'], 'http://myproxy.mycorp.com/'); # set proxy # $ua->env_proxy() # load proxy info from environment variables # $ua->no_proxy('localhost', 'mycorp.com'); # No proxy for local machines $ua->agent("Mozilla/6.0"); # Or something equally mysterious my $req = new HTTP::Request GET => 'http://mycorp.com/'; my $res = $ua->request($req); if ($res->is_success) { $content= $res->content; } else { die "Could not get content"; }
#!/usr/bin/perl use LWP::UserAgent; use HTML::LinkExtor; use URI::URL; use Getopt::Std; use strict; my ($url, $opt_u); getopts("u:"); if ($opt_u) { $url= $opt_u } else { $url = "http://mycorp.com/"; } my $ua = new LWP::UserAgent; # Set up a callback that collect links my @links = (); my @abs_links; sub callback { my($tag, %attr) = @_; return if $tag ne 'a'; # we only look closer at return if $attr{href} =~ /^mailto:/; # ignore mailto: push(@links, values %attr); } # Make the parser. Unfortunately, we don't know the base yet # (it might be diffent from $url) my $p = HTML::LinkExtor->new(\&callback); # Request document and parse it as it arrives my $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); # Expand all URLs to absolute ones my $base = $res->base; @links = map { $_ = url($_, $base)->abs; } @links; for(@links) { $_ = url($_, $base)->abs; if (/$url/) { push @abs_links, $_; } # don't go outside this site } # Print them out print join("\n", @abs_links), "\n";
require LWP::RobotUA; $ua = new LWP::RobotUA 'my-robot/0.1', 'me@foo.com'; $ua->delay(10); # be very nice, go slowly ... # just use it just like a normal LWP::UserAgent $res = $ua->request($req);
HTTP Error codes are found in the module HTTP::Status.
LWP::MemberMixin -- Access to member variables of Perl5 classes LWP::UserAgent -- WWW user agent class LWP::RobotUA -- When developing a robot applications LWP::Protocol -- Interface to various protocol schemes LWP::Protocol::http -- http:// access LWP::Protocol::file -- file:// access LWP::Protocol::ftp -- ftp:// access ... LWP::Authen::Basic -- Handle 401 and 407 responses LWP::Authen::Digest HTTP::Headers -- MIME/RFC822 style header (used by HTTP::Message) HTTP::Message -- HTTP style message HTTP::Request -- HTTP request HTTP::Response -- HTTP response HTTP::Daemon -- A HTTP server class WWW::RobotRules -- Parse robots.txt files WWW::RobotRules::AnyDBM_File -- Persistent RobotRules The following modules provide various functions and definitions. LWP -- This file. Library version number and documentation. LWP::MediaTypes -- MIME types configuration (text/html etc.) LWP::Debug -- Debug logging module LWP::Simple -- Simplified procedural interface for common functions HTTP::Status -- HTTP status code (200 OK etc) HTTP::Date -- Date parsing module for HTTP date formats HTTP::Negotiate -- HTTP content negotiation calculation File::Listing -- Parse directory listings HTTP::Request::Common -- Construct common HTTP::Request objects