Skip to content

Data Integration Blueprint and Modeling Techniques for a Scalable and Sustainable Architecture

Best in textbook rentals since 2012!

ISBN-10: 0137084935

ISBN-13: 9780137084937

Edition: 2011

Authors: Anthony David Giordano

List price: $59.99
Blue ribbon 30 day, 100% satisfaction guarantee!
Out of stock
We're sorry. This item is currently unavailable.
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Making Data Integration Work: How to Systematically Reduce Cost, Improve Quality, and Enhance Effectiveness Today's enterprises are investing massive resources in data integration. Many possess thousands of point-to-point data integration applications that are costly, undocumented, and difficult to maintain. Data integration now accounts for a major part of the expense and risk of typical data warehousing and business intelligence projects--and, as businesses increasingly rely on analytics, the need for a blueprint for data integration is increasing now more than ever. This book presents the solution: a clear, consistent approach to defining, designing, and building data integration…    
Customers also bought

Book details

List price: $59.99
Copyright year: 2011
Publisher: Pearson Education
Publication date: 12/27/2010
Binding: Hardcover
Pages: 416
Size: 6.00" wide x 9.00" long x 1.25" tall
Weight: 1.694
Language: English

Preface
Acknowledgments
About the Author
Introduction: Why Is Data Integration Important?
Overview of Data Integration
Types of Data Integration
Data Integration Architectural Patterns
Enterprise Application Integration (EAI)
Service-Oriented Architecture (SOA)
Federation
Extract, Transform, Load (ETL)
Common Data Integration Functionality
Summary
End-of-Chapter Questions
An Architecture for Data Integration
What Is Reference Architecture?
Reference Architecture for Data Integration
Objectives of the Data Integration Reference Architecture
The Data Subject Area-Based Component Design Approach
A Scalable Architecture
Purposes of the Data Integration Reference Architecture
The Layers of the Data Integration Architecture
Extract/Subscribe Processes
Data Integration Guiding Principle: "Read Once, Write Many"
Data Integration Guiding Principle: "Grab Everything"
Initial Staging Landing Zone
Data Quality Processes
What Is Data Quality?
Causes of Poor Data Quality
Data Quality Check Points
Where to Perform a Data Quality Check
Clean Staging Landing Zone
Transform Processes
Conforming Transform Types
Calculations and Splits Transform Types
Processing and Enrichment Transform Types
Target Filters Transform Types
Load-Ready Publish Landing Zone
Load/Publish Processes
Physical Load Architectures
An Overall Data Architecture
Summary
End-of-Chapter Questions
A Design Technique: Data Integration Modeling
The Business Case for a New Design Process
Improving the Development Process
Leveraging Process Modeling for Data Integration
Overview of Data Integration Modeling
Modeling to the Data Integration Architecture
Data Integration Models within the SDLC
Structuring Models on the Reference Architecture
Conceptual Data Integration Models
Logical Data Integration Models
High-Level Logical Data Integration Model
Logical Extraction Data Integration Models
Logical Data Quality Data Integration Models
Logical Transform Data Integration Models
Logical Load Data Integration Models
Physical Data Integration Models
Converting Logical Data Integration Models to Physical Data Integration Models
Target-Based Data Integration Design Technique Overview
Physical Source System Data Integration Models
Physical Common Component Data Integration Models
Physical Subject Area Load Data Integration Models
Logical Versus Physical Data Integration Models
Tools for Developing Data Integration Models
Industry-Based Data Integration Models
Summary
End-of-Chapter Questions
Case Study: Customer Loan Data Warehouse Project
Case Study Overview
Step 1: Build a Conceptual Data Integration Model
Step 2: Build a High-Level Logical Model Data Integration Model
Step 3: Build the Logical Extract DI Models
Confirm the Subject Area Focus from the Data Mapping Document
Review Whether the Existing Data Integration Environment Can Fulfill the Requirements
Determine the Business Extraction Rules
Control File Check Processing
Complete the Logical Extract Data Integration Models
Final Thoughts on Designing a Logical Extract DI Model
Step 4: Define a Logical Data Quality DI Model
Design a Logical Data Quality Data Integration Model
Identify Technical and Business Data Quality Criteria
Determine Absolute and Optional Data Quality Criteria
Step 5: Define the Logical Transform DI Model
Step 6: Define the Logical Load DI Model
Step 7: Determine the Physicalization Strategy
Step 8: Convert the Logical Extract Models into Physical Source System Extract DI Models
Step 9: Refine the Logical Load Models into Physical Source System Subject Area Load DI Models
Step 10: Package the Enterprise Business Rules into Common Component Models
Step 11: Sequence the Physical DI Models
Summary
The Data Integration Systems Development Life Cycle
Data Integration Analysis
Analyzing Data Integration Requirements
Building a Conceptual Data Integration Model
Key Conceptual Data Integration Modeling Task Steps
Why Is Source System Data Discovery So Difficult?
Performing Source System Data Profiling
Overview of Data Profiling
Key Source System Data Profiling Task Steps
Reviewing/Assessing Source Data Quality
Validation Checks to Assess the Data
Key Review/Assess Source Data Quality Task Steps
Performing Source\Target Data Mappings
Overview of Data Mapping
Types of Data Mapping
Key Source\Target Data Mapping Task Steps
Summary
End-of-Chapter Questions
Data Integration Analysis Case Study 117
Case Study Overview
Envisioned Wheeler Data Warehouse Environment
Aggregations in a Data Warehouse Environment
Data Integration Analysis Phase
Step 1: Build a Conceptual Data Integration Model
Step 2: Perform Source System Data Profiling
Step 3: Review/Assess Source Data Quality
Step 4: Perform Source\Target Data Mappings
Summary
Data Integration Logical Design
Determining High-Level Data Volumetrics
Extract Sizing
Disk Space Sizing
File Size Impacts Component Design
Key Data Integration Volumetrics Task Steps
Establishing a Data Integration Architecture
Identifying Data Quality Criteria
Examples of Data Quality Criteria from a Target
Key Data Quality Criteria Identification Task Steps
Creating Logical Data Integration Models
Key Logical Data Integration Model Task Steps
Defining One-Time Data Conversion Load Logical Design
Designing a History Conversion
One-Time History Data Conversion Task Steps
Summary
End-of-Chapter Questions
Data Integration Logical Design Case Study 169
Step 1: Determine High-Level Data Volumetrics
Step 2: Establish the Data Integration Architecture
Step 3: Identify Data Quality Criteria
Step 4: Create Logical Data Integration Models
Define the High-Level Logical Data Integration Model
Define the Logical Extraction Data Integration Model
Define the Logical Data Quality Data Integration Model
Define Logical Transform Data Integration Model
Define Logical Load Data Integration Model
Define Logical Data Mart Data Integration Model
Develop the History Conversion Design
Summary
Data Integration Physical Design
Creating Component-Based Physical Designs
Reviewing the Rationale for a Component-Based Design
Modularity Design Principles
Key Component-Based Physical Designs Creation Task Steps
Preparing the DI Development Environment
Key Data Integration Development Environment Preparation Task Steps
Creating Physical Data Integration Models
Point-to-Point Application Development--The Evolution of Data Integration Development
The High-Level Logical Data Integration Model in Physical Design
Design Physical Common Components Data Integration Models
Design Physical Source System Extract Data Integration Models
Design Physical Subject Area Load Data Integration Models
Designing Parallelism into the Data Integration Models
Types of Data Integration Parallel Processing
Other Parallel Processing Design Considerations
Parallel Processing Pitfalls
Key Parallelism Design Task Steps
Designing Change Data Capture
Append Change Data Capture Design Complexities
Key Change Data Capture Design Task Steps
Finalizing the History Conversion Design
From Hypothesis to Fact
Finalize History Data Conversion Design Task Steps
Defining Data Integration Operational Requirements
Determining a Job Schedule for the Data Integration Jobs
Determining a Production Support Team
Key Data Integration Operational Requirements Task Steps
Designing Data Integration Components for SOA
Leveraging Traditional Data Integration Processes as SOA Services
Appropriate Data Integration Job Types
Key Data Integration Design for SOA Task Steps
Summary
End-of-Chapter Questions
Data Integration Physical Design Case Study 229
Step 1: Create Physical Data Integration Models
Instantiating the Logical Data Integration Models into a Data Integration Package
Step 2: Find Opportunities to Tune through Parallel Processing
Step 3: Complete Wheeler History Conversion Design
Step 4: Define Data Integration Operational Requirements
Developing a Job Schedule for Wheeler
The Wheeler Monthly Job Schedule
The Wheeler Monthly Job Flow
Process Step 1: Preparation for the EDW Load Processing
Process Step 2: Source System to Subject Area File Processing
Process Step 3: Subject Area Files to EDW Load Processing
Process Step 4: EDW-to-Product Line Profitability Data Mart Load Processing
Production Support Staffing
Summary
Data Integration Development Cycle
Performing General Data Integration Development Activities
Data Integration Development Standards
Error-Handling Requirements
Naming Standards
Key General Development Task Steps
Prototyping a Set of Data Integration Functionality
The Rationale for Prototyping
Benefits of Prototyping
Prototyping Example
Key Data Integration Prototyping Task Steps
Completing/Extending Data Integration Job Code
Complete/Extend Common Component Data Integration Jobs
Complete/Extend the Source System Extract Data Integration Jobs
Complete/Extend the Subject Area Load Data Integration Jobs
Performing Data Integration Testing
Data Warehousing Testing Overview
Types of Data Warehousing Testing
Perform Data Warehouse Unit Testing
Perform Data Warehouse Integration Testing
Perform Data Warehouse System and Performance Testing
Perform Data Warehouse User Acceptance Testing
The Role of Configuration Management in Data Integration
What Is Configuration Management?
Data Integration Version Control
Data Integration Software Promotion Life Cycle
Summary
End-of-Chapter Questions
Data Integration Development Cycle Case Study 279
Step 1: Prototype the Common Customer Key
Step 2: Develop User Test Cases
Domestic OM Source System Extract Job Unit Test Case
Summary
Data Integration with Other Information Management Disciplines
Data Integration and Data Governance
What Is Data Governance?
Why Is Data Governance Important?
Components of Data Governance
Foundational Data Governance Processes
Data Governance Organizational Structure
Data Stewardship Processes
Data Governance Functions in Data Warehousing
Compliance in Data Governance
Data Governance Change Management
Summary
End-of-Chapter Questions
Metadata
What Is Metadata?
The Role of Metadata in Data Integration
Categories of Metadata
Business Metadata
Structural Metadata
Navigational Metadata
Analytic Metadata
Operational Metadata
Metadata as Part of a Reference Architecture
Metadata Users
Managing Metadata
The Importance of Metadata Management in Data Governance
Metadata Environment Current State
Metadata Management Plan
Metadata Management Life Cycle
Summary
End-of-Chapter Questions
Data Quality
The Data Quality Framework
Key Data Quality Elements
The Technical Data Quality Dimension
The Business-Process Data Quality Dimension
Types of Data Quality Processes
The Data Quality Life Cycle
The Define Phase
Defining the Data Quality Scope
Identifying/Defining the Data Quality Elements
Developing Preventive Data Quality Processes
The Audit Phase
Developing a Data Quality Measurement Process
Developing Data Quality Reports
Auditing Data Quality by LOB or Subject Area
The Renovate Phase
Data Quality Assessment and Remediation Projects
Data Quality SWAT Renovation Projects
Data Quality Programs
Final Thoughts on Data Quality
Summary
End-of-Chapter Questions
Exercise Answers
Data Integration Guiding Principles
Write Once, Read Many
Grab Everything
Data Quality before Transforms
Transformation Componentization
Where to Perform Aggregations and Calculations
Data Integration Environment Volumetric Sizing
Subject Area Volumetric Sizing
Glossary
Case Study Models
Is an online-only appendix. Print-book readers can download the appendix at www.ibmpressbooks.com/title/9780137084937. For eBook editions, the appendix is included in the book
Index