| |
| |
Foreword | |
| |
| |
Preface | |
| |
| |
| |
Introduction to Web Automation | |
| |
| |
The Web as Data Source | |
| |
| |
History of LWP | |
| |
| |
Installing LWP | |
| |
| |
Words of Caution | |
| |
| |
LWP in Action | |
| |
| |
| |
Web Basics | |
| |
| |
URLs | |
| |
| |
An HTTP Transaction | |
| |
| |
LWP::Simple | |
| |
| |
Fetching Documents Without LWP::Simple | |
| |
| |
Example: AltaVista | |
| |
| |
Http Post | |
| |
| |
Example: Babelfish | |
| |
| |
| |
The LWP Class Model | |
| |
| |
The Basic Classes | |
| |
| |
Programming with LWP Classes | |
| |
| |
Inside the do_GET and do_POST Functions | |
| |
| |
User Agents | |
| |
| |
HTTP::Response Objects | |
| |
| |
LWP Classes: Behind the Scenes | |
| |
| |
| |
URLs | |
| |
| |
Parsing URLs | |
| |
| |
Relative URLs | |
| |
| |
Converting Absolute URLs to Relative | |
| |
| |
Converting Relative URLs to Absolute | |
| |
| |
| |
Forms | |
| |
| |
Elements of an HTML Form | |
| |
| |
LWP and GET Requests | |
| |
| |
Automating Form Analysis | |
| |
| |
Idiosyncrasies of HTML Forms | |
| |
| |
POST Example: License Plates | |
| |
| |
POST Example: ABEBooks.com | |
| |
| |
File Uploads | |
| |
| |
Limits on Forms | |
| |
| |
| |
Simple HTML Processing with Regular Expressions | |
| |
| |
Automating Data Extraction | |
| |
| |
Regular Expression Techniques | |
| |
| |
Troubleshooting | |
| |
| |
When Regular Expressions Aren't Enough | |
| |
| |
Example: Extracting Links from a Bookmark File | |
| |
| |
Example: Extracting Links from Arbitrary HTML | |
| |
| |
Example: Extracting Temperatures from Weather Underground | |
| |
| |
| |
HTML Processing with Tokens | |
| |
| |
HTML as Tokens | |
| |
| |
Basic HTML::TokeParser Use | |
| |
| |
Individual Tokens | |
| |
| |
Token Sequences | |
| |
| |
More HTML::TokeParser Methods | |
| |
| |
Using Extracted Text | |
| |
| |
| |
Tokenizing Walkthrough | |
| |
| |
The Problem | |
| |
| |
Getting the Data | |
| |
| |
Inspecting the HTML | |
| |
| |
First Code | |
| |
| |
Narrowing In | |
| |
| |
Rewrite for Features | |
| |
| |
Alternatives | |
| |
| |
| |
HTML Processing with Trees | |
| |
| |
Introduction to Trees | |
| |
| |
HTML::TreeBuilder | |
| |
| |
Processing | |
| |
| |
Example: BBC News | |
| |
| |
Example: Fresh Air | |
| |
| |
| |
Modifying HTML with Trees | |
| |
| |
Changing Attributes | |
| |
| |
Deleting Images | |
| |
| |
Detaching and Reattaching | |
| |
| |
Attaching in Another Tree | |
| |
| |
Creating New Elements | |
| |
| |
| |
Cookies, Authentication, and Advanced Requests | |
| |
| |
Cookies | |
| |
| |
Adding Extra Request Header Lines | |
| |
| |
Authentication | |
| |
| |
An HTTP Authentication Example: The Unicode Mailing Archive | |
| |
| |
| |
Spiders | |
| |
| |
Types of Web-Querying Programs | |
| |
| |
A User Agent for Robots | |
| |
| |
Example: A Link-Checking Spider | |
| |
| |
Ideas for Further Expansion | |
| |
| |
| |
LWP Modules | |
| |
| |
| |
HTTP Status Codes | |
| |
| |
| |
Common MIME Types | |
| |
| |
| |
Language Tags | |
| |
| |
| |
Common Content Encodings | |
| |
| |
| |
ASCII Table | |
| |
| |
| |
User's View of Object-Oriented Modules | |
| |
| |
Index | |