Skip to content

Perl and LWP Fetching Web Pages, Parsing HTML, Writing Spiders and More

Best in textbook rentals since 2012!

ISBN-10: 0596001789

ISBN-13: 9780596001780

Edition: 2001

Authors: Sean M. Burke, Gisle Aas

List price: $39.99
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

Perl soared to popularity as a language for creating and managing web content, but with LWP (Library for WWW in Perl), Perl is equally adept at consuming information on the Web. LWP is a suite of modules for fetching and processing web pages. The Web is a vast data source that contains everything from stock prices to movie credits, and with LWP all that data is just a few lines of code away. Anything you do on the Web, whether it's buying or selling, reading or writing, uploading or downloading, news to e-commerce, can be controlled with Perl and LWP. You can automate Web-based purchase orders as easily as you can set up a program to download MP3 files from a web site. "Perl & LWP…    
Customers also bought

Book details

List price: $39.99
Copyright year: 2001
Publisher: O'Reilly Media, Incorporated
Publication date: 6/30/2002
Binding: Paperback
Pages: 262
Size: 7.00" wide x 9.19" long x 0.68" tall
Weight: 0.880

Foreword
Preface
Introduction to Web Automation
The Web as Data Source
History of LWP
Installing LWP
Words of Caution
LWP in Action
Web Basics
URLs
An HTTP Transaction
LWP::Simple
Fetching Documents Without LWP::Simple
Example: AltaVista
Http Post
Example: Babelfish
The LWP Class Model
The Basic Classes
Programming with LWP Classes
Inside the do_GET and do_POST Functions
User Agents
HTTP::Response Objects
LWP Classes: Behind the Scenes
URLs
Parsing URLs
Relative URLs
Converting Absolute URLs to Relative
Converting Relative URLs to Absolute
Forms
Elements of an HTML Form
LWP and GET Requests
Automating Form Analysis
Idiosyncrasies of HTML Forms
POST Example: License Plates
POST Example: ABEBooks.com
File Uploads
Limits on Forms
Simple HTML Processing with Regular Expressions
Automating Data Extraction
Regular Expression Techniques
Troubleshooting
When Regular Expressions Aren't Enough
Example: Extracting Links from a Bookmark File
Example: Extracting Links from Arbitrary HTML
Example: Extracting Temperatures from Weather Underground
HTML Processing with Tokens
HTML as Tokens
Basic HTML::TokeParser Use
Individual Tokens
Token Sequences
More HTML::TokeParser Methods
Using Extracted Text
Tokenizing Walkthrough
The Problem
Getting the Data
Inspecting the HTML
First Code
Narrowing In
Rewrite for Features
Alternatives
HTML Processing with Trees
Introduction to Trees
HTML::TreeBuilder
Processing
Example: BBC News
Example: Fresh Air
Modifying HTML with Trees
Changing Attributes
Deleting Images
Detaching and Reattaching
Attaching in Another Tree
Creating New Elements
Cookies, Authentication, and Advanced Requests
Cookies
Adding Extra Request Header Lines
Authentication
An HTTP Authentication Example: The Unicode Mailing Archive
Spiders
Types of Web-Querying Programs
A User Agent for Robots
Example: A Link-Checking Spider
Ideas for Further Expansion
LWP Modules
HTTP Status Codes
Common MIME Types
Language Tags
Common Content Encodings
ASCII Table
User's View of Object-Oriented Modules
Index