| |
| |
Preface | |
| |
| |
Acknowledgments | |
| |
| |
About the Author | |
| |
| |
Introduction: Why Is Data Integration Important? | |
| |
| |
| |
Overview of Data Integration | |
| |
| |
| |
Types of Data Integration | |
| |
| |
Data Integration Architectural Patterns | |
| |
| |
Enterprise Application Integration (EAI) | |
| |
| |
Service-Oriented Architecture (SOA) | |
| |
| |
Federation | |
| |
| |
Extract, Transform, Load (ETL) | |
| |
| |
Common Data Integration Functionality | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
An Architecture for Data Integration | |
| |
| |
What Is Reference Architecture? | |
| |
| |
Reference Architecture for Data Integration | |
| |
| |
Objectives of the Data Integration Reference Architecture | |
| |
| |
The Data Subject Area-Based Component Design Approach | |
| |
| |
A Scalable Architecture | |
| |
| |
Purposes of the Data Integration Reference Architecture | |
| |
| |
The Layers of the Data Integration Architecture | |
| |
| |
Extract/Subscribe Processes | |
| |
| |
Data Integration Guiding Principle: "Read Once, Write Many" | |
| |
| |
Data Integration Guiding Principle: "Grab Everything" | |
| |
| |
Initial Staging Landing Zone | |
| |
| |
Data Quality Processes | |
| |
| |
What Is Data Quality? | |
| |
| |
Causes of Poor Data Quality | |
| |
| |
Data Quality Check Points | |
| |
| |
Where to Perform a Data Quality Check | |
| |
| |
Clean Staging Landing Zone | |
| |
| |
Transform Processes | |
| |
| |
Conforming Transform Types | |
| |
| |
Calculations and Splits Transform Types | |
| |
| |
Processing and Enrichment Transform Types | |
| |
| |
Target Filters Transform Types | |
| |
| |
Load-Ready Publish Landing Zone | |
| |
| |
Load/Publish Processes | |
| |
| |
Physical Load Architectures | |
| |
| |
An Overall Data Architecture | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
A Design Technique: Data Integration Modeling | |
| |
| |
The Business Case for a New Design Process | |
| |
| |
Improving the Development Process | |
| |
| |
Leveraging Process Modeling for Data Integration | |
| |
| |
Overview of Data Integration Modeling | |
| |
| |
Modeling to the Data Integration Architecture | |
| |
| |
Data Integration Models within the SDLC | |
| |
| |
Structuring Models on the Reference Architecture | |
| |
| |
Conceptual Data Integration Models | |
| |
| |
Logical Data Integration Models | |
| |
| |
High-Level Logical Data Integration Model | |
| |
| |
Logical Extraction Data Integration Models | |
| |
| |
Logical Data Quality Data Integration Models | |
| |
| |
Logical Transform Data Integration Models | |
| |
| |
Logical Load Data Integration Models | |
| |
| |
Physical Data Integration Models | |
| |
| |
Converting Logical Data Integration Models to Physical Data Integration Models | |
| |
| |
Target-Based Data Integration Design Technique Overview | |
| |
| |
Physical Source System Data Integration Models | |
| |
| |
Physical Common Component Data Integration Models | |
| |
| |
Physical Subject Area Load Data Integration Models | |
| |
| |
Logical Versus Physical Data Integration Models | |
| |
| |
Tools for Developing Data Integration Models | |
| |
| |
Industry-Based Data Integration Models | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
Case Study: Customer Loan Data Warehouse Project | |
| |
| |
Case Study Overview | |
| |
| |
Step 1: Build a Conceptual Data Integration Model | |
| |
| |
Step 2: Build a High-Level Logical Model Data Integration Model | |
| |
| |
Step 3: Build the Logical Extract DI Models | |
| |
| |
Confirm the Subject Area Focus from the Data Mapping Document | |
| |
| |
Review Whether the Existing Data Integration Environment Can Fulfill the Requirements | |
| |
| |
Determine the Business Extraction Rules | |
| |
| |
Control File Check Processing | |
| |
| |
Complete the Logical Extract Data Integration Models | |
| |
| |
Final Thoughts on Designing a Logical Extract DI Model | |
| |
| |
Step 4: Define a Logical Data Quality DI Model | |
| |
| |
Design a Logical Data Quality Data Integration Model | |
| |
| |
Identify Technical and Business Data Quality Criteria | |
| |
| |
Determine Absolute and Optional Data Quality Criteria | |
| |
| |
Step 5: Define the Logical Transform DI Model | |
| |
| |
Step 6: Define the Logical Load DI Model | |
| |
| |
Step 7: Determine the Physicalization Strategy | |
| |
| |
Step 8: Convert the Logical Extract Models into Physical Source System Extract DI Models | |
| |
| |
Step 9: Refine the Logical Load Models into Physical Source System Subject Area Load DI Models | |
| |
| |
Step 10: Package the Enterprise Business Rules into Common Component Models | |
| |
| |
Step 11: Sequence the Physical DI Models | |
| |
| |
Summary | |
| |
| |
| |
The Data Integration Systems Development Life Cycle | |
| |
| |
| |
Data Integration Analysis | |
| |
| |
Analyzing Data Integration Requirements | |
| |
| |
Building a Conceptual Data Integration Model | |
| |
| |
Key Conceptual Data Integration Modeling Task Steps | |
| |
| |
Why Is Source System Data Discovery So Difficult? | |
| |
| |
Performing Source System Data Profiling | |
| |
| |
Overview of Data Profiling | |
| |
| |
Key Source System Data Profiling Task Steps | |
| |
| |
Reviewing/Assessing Source Data Quality | |
| |
| |
Validation Checks to Assess the Data | |
| |
| |
Key Review/Assess Source Data Quality Task Steps | |
| |
| |
Performing Source\Target Data Mappings | |
| |
| |
Overview of Data Mapping | |
| |
| |
Types of Data Mapping | |
| |
| |
Key Source\Target Data Mapping Task Steps | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
Data Integration Analysis Case Study 117 | |
| |
| |
Case Study Overview | |
| |
| |
Envisioned Wheeler Data Warehouse Environment | |
| |
| |
Aggregations in a Data Warehouse Environment | |
| |
| |
Data Integration Analysis Phase | |
| |
| |
Step 1: Build a Conceptual Data Integration Model | |
| |
| |
Step 2: Perform Source System Data Profiling | |
| |
| |
Step 3: Review/Assess Source Data Quality | |
| |
| |
Step 4: Perform Source\Target Data Mappings | |
| |
| |
Summary | |
| |
| |
| |
Data Integration Logical Design | |
| |
| |
Determining High-Level Data Volumetrics | |
| |
| |
Extract Sizing | |
| |
| |
Disk Space Sizing | |
| |
| |
File Size Impacts Component Design | |
| |
| |
Key Data Integration Volumetrics Task Steps | |
| |
| |
Establishing a Data Integration Architecture | |
| |
| |
Identifying Data Quality Criteria | |
| |
| |
Examples of Data Quality Criteria from a Target | |
| |
| |
Key Data Quality Criteria Identification Task Steps | |
| |
| |
Creating Logical Data Integration Models | |
| |
| |
Key Logical Data Integration Model Task Steps | |
| |
| |
Defining One-Time Data Conversion Load Logical Design | |
| |
| |
Designing a History Conversion | |
| |
| |
One-Time History Data Conversion Task Steps | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
Data Integration Logical Design Case Study 169 | |
| |
| |
Step 1: Determine High-Level Data Volumetrics | |
| |
| |
Step 2: Establish the Data Integration Architecture | |
| |
| |
Step 3: Identify Data Quality Criteria | |
| |
| |
Step 4: Create Logical Data Integration Models | |
| |
| |
Define the High-Level Logical Data Integration Model | |
| |
| |
Define the Logical Extraction Data Integration Model | |
| |
| |
Define the Logical Data Quality Data Integration Model | |
| |
| |
Define Logical Transform Data Integration Model | |
| |
| |
Define Logical Load Data Integration Model | |
| |
| |
Define Logical Data Mart Data Integration Model | |
| |
| |
Develop the History Conversion Design | |
| |
| |
Summary | |
| |
| |
| |
Data Integration Physical Design | |
| |
| |
Creating Component-Based Physical Designs | |
| |
| |
Reviewing the Rationale for a Component-Based Design | |
| |
| |
Modularity Design Principles | |
| |
| |
Key Component-Based Physical Designs Creation Task Steps | |
| |
| |
Preparing the DI Development Environment | |
| |
| |
Key Data Integration Development Environment Preparation Task Steps | |
| |
| |
Creating Physical Data Integration Models | |
| |
| |
Point-to-Point Application Development--The Evolution of Data Integration Development | |
| |
| |
The High-Level Logical Data Integration Model in Physical Design | |
| |
| |
Design Physical Common Components Data Integration Models | |
| |
| |
Design Physical Source System Extract Data Integration Models | |
| |
| |
Design Physical Subject Area Load Data Integration Models | |
| |
| |
Designing Parallelism into the Data Integration Models | |
| |
| |
Types of Data Integration Parallel Processing | |
| |
| |
Other Parallel Processing Design Considerations | |
| |
| |
Parallel Processing Pitfalls | |
| |
| |
Key Parallelism Design Task Steps | |
| |
| |
Designing Change Data Capture | |
| |
| |
Append Change Data Capture Design Complexities | |
| |
| |
Key Change Data Capture Design Task Steps | |
| |
| |
Finalizing the History Conversion Design | |
| |
| |
From Hypothesis to Fact | |
| |
| |
Finalize History Data Conversion Design Task Steps | |
| |
| |
Defining Data Integration Operational Requirements | |
| |
| |
Determining a Job Schedule for the Data Integration Jobs | |
| |
| |
Determining a Production Support Team | |
| |
| |
Key Data Integration Operational Requirements Task Steps | |
| |
| |
Designing Data Integration Components for SOA | |
| |
| |
Leveraging Traditional Data Integration Processes as SOA Services | |
| |
| |
Appropriate Data Integration Job Types | |
| |
| |
Key Data Integration Design for SOA Task Steps | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
Data Integration Physical Design Case Study 229 | |
| |
| |
Step 1: Create Physical Data Integration Models | |
| |
| |
Instantiating the Logical Data Integration Models into a Data Integration Package | |
| |
| |
Step 2: Find Opportunities to Tune through Parallel Processing | |
| |
| |
Step 3: Complete Wheeler History Conversion Design | |
| |
| |
Step 4: Define Data Integration Operational Requirements | |
| |
| |
Developing a Job Schedule for Wheeler | |
| |
| |
The Wheeler Monthly Job Schedule | |
| |
| |
The Wheeler Monthly Job Flow | |
| |
| |
Process Step 1: Preparation for the EDW Load Processing | |
| |
| |
Process Step 2: Source System to Subject Area File Processing | |
| |
| |
Process Step 3: Subject Area Files to EDW Load Processing | |
| |
| |
Process Step 4: EDW-to-Product Line Profitability Data Mart Load Processing | |
| |
| |
Production Support Staffing | |
| |
| |
Summary | |
| |
| |
| |
Data Integration Development Cycle | |
| |
| |
Performing General Data Integration Development Activities | |
| |
| |
Data Integration Development Standards | |
| |
| |
Error-Handling Requirements | |
| |
| |
Naming Standards | |
| |
| |
Key General Development Task Steps | |
| |
| |
Prototyping a Set of Data Integration Functionality | |
| |
| |
The Rationale for Prototyping | |
| |
| |
Benefits of Prototyping | |
| |
| |
Prototyping Example | |
| |
| |
Key Data Integration Prototyping Task Steps | |
| |
| |
Completing/Extending Data Integration Job Code | |
| |
| |
Complete/Extend Common Component Data Integration Jobs | |
| |
| |
Complete/Extend the Source System Extract Data Integration Jobs | |
| |
| |
Complete/Extend the Subject Area Load Data Integration Jobs | |
| |
| |
Performing Data Integration Testing | |
| |
| |
Data Warehousing Testing Overview | |
| |
| |
Types of Data Warehousing Testing | |
| |
| |
Perform Data Warehouse Unit Testing | |
| |
| |
Perform Data Warehouse Integration Testing | |
| |
| |
Perform Data Warehouse System and Performance Testing | |
| |
| |
Perform Data Warehouse User Acceptance Testing | |
| |
| |
The Role of Configuration Management in Data Integration | |
| |
| |
What Is Configuration Management? | |
| |
| |
Data Integration Version Control | |
| |
| |
Data Integration Software Promotion Life Cycle | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
Data Integration Development Cycle Case Study 279 | |
| |
| |
Step 1: Prototype the Common Customer Key | |
| |
| |
Step 2: Develop User Test Cases | |
| |
| |
Domestic OM Source System Extract Job Unit Test Case | |
| |
| |
Summary | |
| |
| |
| |
Data Integration with Other Information Management Disciplines | |
| |
| |
| |
Data Integration and Data Governance | |
| |
| |
What Is Data Governance? | |
| |
| |
Why Is Data Governance Important? | |
| |
| |
Components of Data Governance | |
| |
| |
Foundational Data Governance Processes | |
| |
| |
Data Governance Organizational Structure | |
| |
| |
Data Stewardship Processes | |
| |
| |
Data Governance Functions in Data Warehousing | |
| |
| |
Compliance in Data Governance | |
| |
| |
Data Governance Change Management | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
Metadata | |
| |
| |
What Is Metadata? | |
| |
| |
The Role of Metadata in Data Integration | |
| |
| |
Categories of Metadata | |
| |
| |
Business Metadata | |
| |
| |
Structural Metadata | |
| |
| |
Navigational Metadata | |
| |
| |
Analytic Metadata | |
| |
| |
Operational Metadata | |
| |
| |
Metadata as Part of a Reference Architecture | |
| |
| |
Metadata Users | |
| |
| |
Managing Metadata | |
| |
| |
The Importance of Metadata Management in Data Governance | |
| |
| |
Metadata Environment Current State | |
| |
| |
Metadata Management Plan | |
| |
| |
Metadata Management Life Cycle | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
Data Quality | |
| |
| |
The Data Quality Framework | |
| |
| |
Key Data Quality Elements | |
| |
| |
The Technical Data Quality Dimension | |
| |
| |
The Business-Process Data Quality Dimension | |
| |
| |
Types of Data Quality Processes | |
| |
| |
The Data Quality Life Cycle | |
| |
| |
The Define Phase | |
| |
| |
Defining the Data Quality Scope | |
| |
| |
Identifying/Defining the Data Quality Elements | |
| |
| |
Developing Preventive Data Quality Processes | |
| |
| |
The Audit Phase | |
| |
| |
Developing a Data Quality Measurement Process | |
| |
| |
Developing Data Quality Reports | |
| |
| |
Auditing Data Quality by LOB or Subject Area | |
| |
| |
The Renovate Phase | |
| |
| |
Data Quality Assessment and Remediation Projects | |
| |
| |
Data Quality SWAT Renovation Projects | |
| |
| |
Data Quality Programs | |
| |
| |
Final Thoughts on Data Quality | |
| |
| |
Summary | |
| |
| |
End-of-Chapter Questions | |
| |
| |
| |
Exercise Answers | |
| |
| |
| |
Data Integration Guiding Principles | |
| |
| |
Write Once, Read Many | |
| |
| |
Grab Everything | |
| |
| |
Data Quality before Transforms | |
| |
| |
Transformation Componentization | |
| |
| |
Where to Perform Aggregations and Calculations | |
| |
| |
Data Integration Environment Volumetric Sizing | |
| |
| |
Subject Area Volumetric Sizing | |
| |
| |
| |
Glossary | |
| |
| |
| |
Case Study Models | |
| |
| |
| |
Is an online-only appendix. Print-book readers can download the appendix at www.ibmpressbooks.com/title/9780137084937. For eBook editions, the appendix is included in the book | |
| |
| |
Index | |