| |
| |
Acknowledgments | |
| |
| |
About the Authors | |
| |
| |
Introduction | |
| |
| |
| |
Requirements, Realities, and Architecture | |
| |
| |
| |
Surrounding the Requirements | |
| |
| |
Requirements | |
| |
| |
Architecture | |
| |
| |
The Mission of the Data Warehouse | |
| |
| |
The Mission of the ETL Team | |
| |
| |
| |
ETL Data Structures | |
| |
| |
To Stage or Not to Stage | |
| |
| |
Designing the Staging Area | |
| |
| |
Data Structures in the ETL System | |
| |
| |
Planning and Design Standards | |
| |
| |
Summary | |
| |
| |
| |
Data Flow | |
| |
| |
| |
Extracting | |
| |
| |
| |
The Logical Data Map | |
| |
| |
Inside the Logical Data Map | |
| |
| |
Building the Logical Data Map | |
| |
| |
Integrating Heterogeneous Data Sources | |
| |
| |
| |
The Challenge of Extracting from Disparate Platforms | |
| |
| |
Mainframe Sources | |
| |
| |
Flat Files | |
| |
| |
XML Sources | |
| |
| |
Web Log Sources | |
| |
| |
ERP System Sources | |
| |
| |
| |
Extracting Changed Data | |
| |
| |
Summary | |
| |
| |
| |
Cleaning and Conforming | |
| |
| |
Defining Data Quality | |
| |
| |
Assumptions | |
| |
| |
| |
Design Objectives | |
| |
| |
| |
Cleaning Deliverables | |
| |
| |
| |
Screens and Their Measurements | |
| |
| |
| |
Conforming Deliverables | |
| |
| |
Summary | |
| |
| |
| |
Delivering Dimension Tables | |
| |
| |
The Basic Structure of a Dimension | |
| |
| |
The Grain of a Dimension | |
| |
| |
The Basic Load Plan for a Dimension | |
| |
| |
Flat Dimensions and Snowflaked Dimensions | |
| |
| |
Date and Time Dimensions | |
| |
| |
Big Dimensions | |
| |
| |
Small Dimensions | |
| |
| |
One Dimension or Two | |
| |
| |
Dimensional Roles | |
| |
| |
Dimensions as Subdimensions of Another Dimension | |
| |
| |
Degenerate Dimensions | |
| |
| |
Slowly Changing Dimensions | |
| |
| |
Type 1 Slowly Changing Dimension (Overwrite) | |
| |
| |
Type 2 Slowly Changing Dimension (Partitioning History) | |
| |
| |
Precise Time Stamping of a Type 2 Slowly Changing Dimension | |
| |
| |
Type 3 Slowly Changing Dimension (Alternate Realities) | |
| |
| |
Hybrid Slowly Changing Dimensions | |
| |
| |
Late-Arriving Dimension Records and Correcting Bad Data | |
| |
| |
Multivalued Dimensions and Bridge Tables | |
| |
| |
Ragged Hierarchies and Bridge Tables | |
| |
| |
Technical Note: Populating Hierarchy Bridge Tables | |
| |
| |
Using Positional Attributes in a Dimension to Represent Text Facts | |
| |
| |
Summary | |
| |
| |
| |
Delivering Fact Tables | |
| |
| |
The Basic Structure of a Fact Table | |
| |
| |
Guaranteeing Referential Integrity | |
| |
| |
Surrogate Key Pipeline | |
| |
| |
Fundamental Grains | |
| |
| |
Preparing for Loading Fact Tables | |
| |
| |
Factless Fact Tables | |
| |
| |
Augmenting a Type 1 Fact Table with Type 2 History | |
| |
| |
Graceful Modifications | |
| |
| |
Multiple Units of Measure in a Fact Table | |
| |
| |
Collecting Revenue in Multiple Currencies | |
| |
| |
Late Arriving Facts | |
| |
| |
Aggregations | |
| |
| |
Delivering Dimensional Data to OLAP Cubes | |
| |
| |
Summary | |
| |
| |
| |
Implementation and operations | |
| |
| |
| |
Development | |
| |
| |
Current Marketplace ETL Tool Suite Offerings | |
| |
| |
Current Scripting Languages | |
| |
| |
Time Is of the Essence | |
| |
| |
Using Database Bulk Loader Utilities to Speed Inserts | |
| |
| |
Managing Database Features to Improve Performance | |
| |
| |
Troubleshooting Performance Problems | |
| |
| |
Increasing ETL Throughput | |
| |
| |
Summary | |
| |
| |
| |
Operations | |
| |
| |
Scheduling and Support | |
| |
| |
Migrating to Production | |
| |
| |
Achieving Optimal ETL Performance | |
| |
| |
Purging Historic Data | |
| |
| |
Monitoring the ETL System | |
| |
| |
Tuning ETL Processes | |
| |
| |
ETL System Security | |
| |
| |
Short-Term Archiving and Recovery | |
| |
| |
Long-Term Archiving and Recovery | |
| |
| |
Summary | |
| |
| |
| |
Metadata | |
| |
| |
Defining Metadata | |
| |
| |
Business Metadata | |
| |
| |
Technical Metadata | |
| |
| |
ETL-Generated Metadata | |
| |
| |
Metadata Standards and Practices | |
| |
| |
Impact Analysis | |
| |
| |
Summary | |
| |
| |
| |
Responsibilities | |
| |
| |
Planning and Leadership | |
| |
| |
Managing the Project | |
| |
| |
Summary | |
| |
| |
| |
Real Time Streaming ETL Systems | |
| |
| |
| |
Real-Time ETL Systems | |
| |
| |
Why Real-Time ETL? | |
| |
| |
Defining Real-Time ETL | |
| |
| |
Challenges and Opportunities of Real-Time Data Warehousing | |
| |
| |
Real-Time Data Warehousing Review | |
| |
| |
Categorizing the Requirement | |
| |
| |
Real-Time ETL Approaches | |
| |
| |
Summary | |
| |
| |
| |
Conclusions | |
| |
| |
Deepening the Definition of ETL | |
| |
| |
The Future of Data Warehousing and ETL in Particular | |
| |
| |
Index | |