Skip to content

Data Warehouse ETL Toolkit Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data

Best in textbook rentals since 2012!

ISBN-10: 0764567578

ISBN-13: 9780764567575

Edition: 2004

Authors: Ralph Kimball, Joe Caserta

List price: $48.00
Shipping box This item qualifies for FREE shipping.
Blue ribbon 30 day, 100% satisfaction guarantee!

Rental notice: supplementary materials (access codes, CDs, etc.) are not guaranteed with rental orders.

what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

This title delivers real-world solutions for the most time and labour-intensive portion of data warehousing. It offers proven time-saving ETL techniques, comprehensive guidance on building dimensional structures, and crucial advice on ensuring data quality.
Customers also bought

Book details

List price: $48.00
Copyright year: 2004
Publisher: John Wiley & Sons, Incorporated
Publication date: 10/1/2004
Binding: Paperback
Pages: 528
Size: 7.50" wide x 9.30" long x 1.10" tall
Weight: 2.222
Language: English

Acknowledgments
About the Authors
Introduction
Requirements, Realities, and Architecture
Surrounding the Requirements
Requirements
Architecture
The Mission of the Data Warehouse
The Mission of the ETL Team
ETL Data Structures
To Stage or Not to Stage
Designing the Staging Area
Data Structures in the ETL System
Planning and Design Standards
Summary
Data Flow
Extracting
The Logical Data Map
Inside the Logical Data Map
Building the Logical Data Map
Integrating Heterogeneous Data Sources
The Challenge of Extracting from Disparate Platforms
Mainframe Sources
Flat Files
XML Sources
Web Log Sources
ERP System Sources
Extracting Changed Data
Summary
Cleaning and Conforming
Defining Data Quality
Assumptions
Design Objectives
Cleaning Deliverables
Screens and Their Measurements
Conforming Deliverables
Summary
Delivering Dimension Tables
The Basic Structure of a Dimension
The Grain of a Dimension
The Basic Load Plan for a Dimension
Flat Dimensions and Snowflaked Dimensions
Date and Time Dimensions
Big Dimensions
Small Dimensions
One Dimension or Two
Dimensional Roles
Dimensions as Subdimensions of Another Dimension
Degenerate Dimensions
Slowly Changing Dimensions
Type 1 Slowly Changing Dimension (Overwrite)
Type 2 Slowly Changing Dimension (Partitioning History)
Precise Time Stamping of a Type 2 Slowly Changing Dimension
Type 3 Slowly Changing Dimension (Alternate Realities)
Hybrid Slowly Changing Dimensions
Late-Arriving Dimension Records and Correcting Bad Data
Multivalued Dimensions and Bridge Tables
Ragged Hierarchies and Bridge Tables
Technical Note: Populating Hierarchy Bridge Tables
Using Positional Attributes in a Dimension to Represent Text Facts
Summary
Delivering Fact Tables
The Basic Structure of a Fact Table
Guaranteeing Referential Integrity
Surrogate Key Pipeline
Fundamental Grains
Preparing for Loading Fact Tables
Factless Fact Tables
Augmenting a Type 1 Fact Table with Type 2 History
Graceful Modifications
Multiple Units of Measure in a Fact Table
Collecting Revenue in Multiple Currencies
Late Arriving Facts
Aggregations
Delivering Dimensional Data to OLAP Cubes
Summary
Implementation and operations
Development
Current Marketplace ETL Tool Suite Offerings
Current Scripting Languages
Time Is of the Essence
Using Database Bulk Loader Utilities to Speed Inserts
Managing Database Features to Improve Performance
Troubleshooting Performance Problems
Increasing ETL Throughput
Summary
Operations
Scheduling and Support
Migrating to Production
Achieving Optimal ETL Performance
Purging Historic Data
Monitoring the ETL System
Tuning ETL Processes
ETL System Security
Short-Term Archiving and Recovery
Long-Term Archiving and Recovery
Summary
Metadata
Defining Metadata
Business Metadata
Technical Metadata
ETL-Generated Metadata
Metadata Standards and Practices
Impact Analysis
Summary
Responsibilities
Planning and Leadership
Managing the Project
Summary
Real Time Streaming ETL Systems
Real-Time ETL Systems
Why Real-Time ETL?
Defining Real-Time ETL
Challenges and Opportunities of Real-Time Data Warehousing
Real-Time Data Warehousing Review
Categorizing the Requirement
Real-Time ETL Approaches
Summary
Conclusions
Deepening the Definition of ETL
The Future of Data Warehousing and ETL in Particular
Index