| |
| |
Preface | |
| |
| |
| |
Introduction to Regular Expressions | |
| |
| |
Solving Real Problems | |
| |
| |
Regular Expressions as a Language | |
| |
| |
The Filename Analogy | |
| |
| |
The Language Analogy | |
| |
| |
The Regular-Expression Frame of Mind | |
| |
| |
If You Have Some Regular-Expression Experience | |
| |
| |
Searching Text Files: Egrep | |
| |
| |
Egrep Metacharacters | |
| |
| |
Start and End of the Line | |
| |
| |
Character Classes | |
| |
| |
Matching Any Character with Dot | |
| |
| |
Alternation | |
| |
| |
Ignoring Differences in Capitalization | |
| |
| |
Word Boundaries | |
| |
| |
In a Nutshell | |
| |
| |
Optional Items | |
| |
| |
Other Quantifiers: Repetition | |
| |
| |
Parentheses and Backreferences | |
| |
| |
The Great Escape | |
| |
| |
Expanding the Foundation | |
| |
| |
Linguistic Diversification | |
| |
| |
The Goal of a Regular Expression | |
| |
| |
A Few More Examples | |
| |
| |
Regular Expression Nomenclature | |
| |
| |
Improving on the Status Quo | |
| |
| |
Summary | |
| |
| |
Personal Glimpses | |
| |
| |
| |
Extended Introductory Examples | |
| |
| |
About the Examples | |
| |
| |
A Short Introduction to Perl | |
| |
| |
Matching Text with Regular Expressions | |
| |
| |
Toward a More Real-World Example | |
| |
| |
Side Effects of a Successful Match | |
| |
| |
Intertwined Regular Expressions | |
| |
| |
Intermission | |
| |
| |
Modifying Text with Regular Expressions | |
| |
| |
Example: Form Letter | |
| |
| |
Example: Prettifying a Stock Price | |
| |
| |
Automated Editing | |
| |
| |
A Small Mail Utility | |
| |
| |
Adding Commas to a Number with Lookaround | |
| |
| |
Text-to-HTML Conversion | |
| |
| |
That Doubled-Word Thing | |
| |
| |
| |
Overview of Regular Expression Features and Flavors | |
| |
| |
A Casual Stroll Across the Regex Landscape | |
| |
| |
The Origins of Regular Expressions | |
| |
| |
At a Glance | |
| |
| |
Care and Handling of Regular Expressions | |
| |
| |
Integrated Handling | |
| |
| |
Procedural and Object-Oriented Handling | |
| |
| |
A Search-and-Replace Example | |
| |
| |
Search and Replace in Other Languages | |
| |
| |
Care and Handling: Summary | |
| |
| |
Strings, Character Encodings, and Modes | |
| |
| |
Strings as Regular Expressions | |
| |
| |
Character-Encoding Issues | |
| |
| |
Regex Modes and Match Modes | |
| |
| |
Common Metacharacters and Features | |
| |
| |
Character Representations | |
| |
| |
Character Classes and Class-Like Constructs | |
| |
| |
Anchors and Other "Zero-Width Assertions" | |
| |
| |
Comments and Mode Modifiers | |
| |
| |
Grouping, Capturing, Conditionals, and Control | |
| |
| |
Guide to the Advanced Chapters | |
| |
| |
| |
The Mechanics of Expression Processing | |
| |
| |
Start Your Engines! | |
| |
| |
Two Kinds of Engines | |
| |
| |
New Standards | |
| |
| |
Regex Engine Types | |
| |
| |
From the Department of Redundancy Department | |
| |
| |
Testing the Engine Type | |
| |
| |
Match Basics | |
| |
| |
About the Examples | |
| |
| |
| |
The Match That Begins Earliest Wins | |
| |
| |
Engine Pieces and Parts | |
| |
| |
| |
The Standard Quantifiers Are Greedy | |
| |
| |
Regex-Directed Versus Text-Directed | |
| |
| |
NFA Engine: Regex-Directed | |
| |
| |
DFA Engine: Text-Directed | |
| |
| |
First Thoughts: NFA and DFA in Comparison | |
| |
| |
Backtracking | |
| |
| |
A Really Crummy Analogy | |
| |
| |
Two Important Points on Backtracking | |
| |
| |
Saved States | |
| |
| |
Backtracking and Greediness | |
| |
| |
More About Greediness and Backtracking | |
| |
| |
Problems of Greediness | |
| |
| |
Multi-Character "Quotes" | |
| |
| |
Using Lazy Quantifiers | |
| |
| |
Greediness and Laziness Always Favor a Match | |
| |
| |
The Essence of Greediness, Laziness, and Backtracking | |
| |
| |
Possessive Quantifiers and Atomic Grouping | |
| |
| |
Possessive Quantifiers, ?+, *+, ++, and {m,n}+ | |
| |
| |
The Backtracking of Lookaround | |
| |
| |
Is Alternation Greedy? | |
| |
| |
Taking Advantage of Ordered Alternation | |
| |
| |
NFA, DFA, and POSIX | |
| |
| |
"The Longest-Leftmost" | |
| |
| |
POSIX and the Longest-Leftmost Rule | |
| |
| |
Speed and Efficiency | |
| |
| |
Summary: NFA and DFA in Comparison | |
| |
| |
Summary | |
| |
| |
| |
Practical Regex Techniques | |
| |
| |
Regex Balancing Act | |
| |
| |
A Few Short Examples | |
| |
| |
Continuing with Continuation Lines | |
| |
| |
Matching an IP Address | |
| |
| |
Working with Filenames | |
| |
| |
Matching Balanced Sets of Parentheses | |
| |
| |
Watching Out for Unwanted Matches | |
| |
| |
Matching Delimited Text | |
| |
| |
Knowing Your Data and Making Assumptions | |
| |
| |
Stripping Leading and Trailing Whitespace | |
| |
| |
HTML-Related Examples | |
| |
| |
Matching an HTML Tag | |
| |
| |
Matching an HTML Link | |
| |
| |
Examining an HTTP URL | |
| |
| |
Validating a Hostname | |
| |
| |
Plucking Out a URL in the Real World | |
| |
| |
Extended Examples | |
| |
| |
Keeping in Sync with Your Data | |
| |
| |
Parsing CSV Files | |
| |
| |
| |
Crafting an Efficient Expression | |
| |
| |
A Sobering Example | |
| |
| |
A Simple Change--Placing Your Best Foot Forward | |
| |
| |
Efficiency Verses Correctness | |
| |
| |
Advancing Further--Localizing the Greediness | |
| |
| |
Reality Check | |
| |
| |
A Global View of Backtracking | |
| |
| |
More Work for a POSIX NFA | |
| |
| |
Work Required During a Non-Match | |
| |
| |
Being More Specific | |
| |
| |
Alternation Can Be Expensive | |
| |
| |
Benchmarking | |
| |
| |
Know What You're Measuring | |
| |
| |
Benchmarking with Java | |
| |
| |
Benchmarking with VB.NET | |
| |
| |
Benchmarking with Python | |
| |
| |
Benchmarking with Ruby | |
| |
| |
Benchmarking with Tcl | |
| |
| |
Common Optimizations | |
| |
| |
No Free Lunch | |
| |
| |
Everyone's Lunch is Different | |
| |
| |
The Mechanics of Regex Application | |
| |
| |
Pre-Application Optimizations | |
| |
| |
Optimizations with the Transmission | |
| |
| |
Optimizations of the Regex Itself | |
| |
| |
Techniques for Faster Expressions | |
| |
| |
Common Sense Techniques | |
| |
| |
Expose Literal Text | |
| |
| |
Expose Anchors | |
| |
| |
Lazy Versus Greedy: Be Specific | |
| |
| |
Split Into Multiple Regular Expressions | |
| |
| |
Mimic Initial-Character Discrimination | |
| |
| |
Use Atomic Grouping and Possessive Quantifiers | |
| |
| |
Lead the Engine to a Match | |
| |
| |
Unrolling the Loop | |
| |
| |
| |
Building a Regex From Past Experiences | |
| |
| |
The Real "Unrolling-the-Loop" Pattern | |
| |
| |
| |
A Top-Down View | |
| |
| |
| |
An Internet Hostname | |
| |
| |
Observations | |
| |
| |
Using Atomic Grouping and Possessive Quantifiers | |
| |
| |
Short Unrolling Examples | |
| |
| |
Unrolling C Comments | |
| |
| |
The Freeflowing Regex | |
| |
| |
A Helping Hand to Guide the Match | |
| |
| |
A Well-Guided Regex is a Fast Regex | |
| |
| |
Wrapup | |
| |
| |
In Summary: Think! | |
| |
| |
| |
Perl | |
| |
| |
Regular Expressions as a Language Component | |
| |
| |
Perl's Greatest Strength | |
| |
| |
Perl's Greatest Weakness | |
| |
| |
Perl's Regex Flavor | |
| |
| |
Regex Operands and Regex Literals | |
| |
| |
How Regex Literals Are Parsed | |
| |
| |
Regex Modifiers | |
| |
| |
Regex-Related Perlisms | |
| |
| |
Expression Context | |
| |
| |
Dynamic Scope and Regex Match Effects | |
| |
| |
Special Variables Modified by a Match | |
| |
| |
The qr/.../ Operator and Regex Objects | |
| |
| |
Building and Using Regex Objects | |
| |
| |
Viewing Regex Objects | |
| |
| |
Using Regex Objects for Efficiency | |
| |
| |
The Match Operator | |
| |
| |
Match's Regex Operand | |
| |
| |
Specifying the Match Target Operand | |
| |
| |
Different Uses of the Match Operator | |
| |
| |
Iterative Matching: Scalar Context, with /g | |
| |
| |
The Match Operator's Environmental Relations | |
| |
| |
The Substitution Operator | |
| |
| |
The Replacement Operand | |
| |
| |
The /e Modifier | |
| |
| |
Context and Return Value | |
| |
| |
The Split Operator | |
| |
| |
Basic Split | |
| |
| |
Returning Empty Elements | |
| |
| |
Split's Special Regex Operands | |
| |
| |
Split's Match Operand with Capturing Parentheses | |
| |
| |
Fun with Perl Enhancements | |
| |
| |
Using a Dynamic Regex to Match Nested Pairs | |
| |
| |
Using the Embedded-Code Construct | |
| |
| |
Using local in an Embedded-Code Construct | |
| |
| |
A Warning About Embedded Code and my Variables | |
| |
| |
Matching Nested Constructs with Embedded Code | |
| |
| |
Overloading Regex Literals | |
| |
| |
Problems with Regex-Literal Overloading | |
| |
| |
Mimicking Named Capture | |
| |
| |
Perl Efficiency Issues | |
| |
| |
"There's More Than One Way to Do It" | |
| |
| |
Regex Compilation, the /o Modifier, qr/.../, and Efficiency | |
| |
| |
Understanding the "Pre-Match" Copy | |
| |
| |
The Study Function | |
| |
| |
Benchmarking | |
| |
| |
Regex Debugging Information | |
| |
| |
Final Comments | |
| |
| |
| |
Java | |
| |
| |
Judging a Regex Package | |
| |
| |
Technical Issues | |
| |
| |
Social and Political Issues | |
| |
| |
Object Models | |
| |
| |
A Few Abstract Object Models | |
| |
| |
Growing Complexity | |
| |
| |
Packages, Packages, Packages | |
| |
| |
Why So Many "Perl5" Flavors? | |
| |
| |
Lies, Damn Lies, and Benchmarks | |
| |
| |
Recommendations | |
| |
| |
Sun's Regex Package | |
| |
| |
Regex Flavor | |
| |
| |
Using java.util.regex | |
| |
| |
The Pattern.compile() Factory | |
| |
| |
The Matcher Object | |
| |
| |
Other Pattern Methods | |
| |
| |
A Quick Look at Jakarta-ORO | |
| |
| |
ORO's Perl5Util | |
| |
| |
A Mini Perl5Util Reference | |
| |
| |
Using ORO's Underlying Classes | |
| |
| |
| |
.NET | |
| |
| |
.NET's Regex Flavor | |
| |
| |
Additional Comments on the Flavor | |
| |
| |
Using .NET Regular Expressions | |
| |
| |
Regex Quickstart | |
| |
| |
Package Overview | |
| |
| |
Core Object Overview | |
| |
| |
Core Object Details | |
| |
| |
Creating Regex Objects | |
| |
| |
Using Regex Objects | |
| |
| |
Using Match Objects | |
| |
| |
Using Group Objects | |
| |
| |
Static "Convenience" Functions | |
| |
| |
Regex Caching | |
| |
| |
Support Functions | |
| |
| |
Advanced .NET | |
| |
| |
Regex Assemblies | |
| |
| |
Matching Nested Constructs | |
| |
| |
Capture Objects | |
| |
| |
Index | |