Skip to content

Document Warehousing and Text Mining Techniques for Improving Business Operations, Marketing and Sales

Best in textbook rentals since 2012!

ISBN-10: 0471399590

ISBN-13: 9780471399599

Edition: 2001

Authors: Dan Sullivan

List price: $55.00
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

"This book combines a thorough introduction to document warehousing with an in-depth technical tutorial for implementation. Dan Sullivan truly leaves no stone unturned. This book is my de-facto document warehousing resource!"--Jill Dyche, Baseline Consulting Group Most business information isn't neatly stored in databases. It's more likely found in a swirl of millions of Web pages, e-mails, and free-form text documents. To capture and tame this flood of information for decision making, businesses are now turning to document warehousing and text mining techniques. This book provides database and data warehouse developers and managers with complete guidance on how to build and manage a…    
Customers also bought

Book details

List price: $55.00
Copyright year: 2001
Publisher: John Wiley & Sons, Incorporated
Publication date: 3/7/2001
Binding: Paperback
Pages: 560
Size: 7.50" wide x 9.00" long x 1.00" tall
Weight: 2.090
Language: English

Acknowledgments
Preface
Text Analysis for Business Intelligence
Expanding the Scope of Business Intelligence
The Need to Deal with Text
Growth of Textual Information--The Good News
Growth of Textual Information--The Bad News
Finding Information: It's Not as Easy as It Used to Be
Beware What You Wish for: Finding Too Much Information
The Document Warehousing Approach to the Information Glut
Supporting Business Intelligence with Text
Defining the Document Warehouse
The Role of Text Mining in Document Warehousing
Building the Document Warehouse
Benefits of Document Warehousing
Conclusions
Understanding the Structure of Text: The Foundation of Text-Based Business Intelligence
The Myth of Unstructured Texts
Natural Structures: It's All in Your Head
The Building Blocks of Language
Working with Statistical Techniques
Macrostructures: Introducing Artificial Structures in Documents
Hierarchical Conventions from Words to Documents
The Jewel in the Crown: Markup Languages for Arbitrary Structure
It Isn't So Linear After All: Hypertext
Conclusions
Exploiting the Structure of Text
Text-Oriented Business Intelligence Operations
Summarizing Documents
Classifying and Routing Documents
Answering Questions
Searching and Browsing by Topic and Theme
Searching by Example
Text-Oriented Business Intelligence Techniques
Full Text Searching: Text Processing 101
Undirected Summarization
Document Clustering
Integration with the Data Warehouse
Dimensional Models: A Quick Refresher
Integration with the World Wide Web
Adapting to Changing Users' Interests
Conclusions
Document Warehousing
Overview of Document Warehousing
Meeting Business Intelligence Requirements
Who Are the End Users?
What Information Is Needed?
When Is It Needed?
Where Is the Information Found?
The Role of the Document Warehouse in Business Intelligence
The Architecture of the Document Warehouse
Document Sources
Text Processing Servers
Text Databases and Other Storage Options
Metadata Repositories
User Profiling
The Process of Document Warehousing
Identifying Document Sources
Document Retrieval
Preprocessing Operations
Text Analysis Operations
Managing the Document Warehouse
Supporting End-User Operations
Conclusions
Meeting Business Intelligence Requirements: More Than Just Numbers
A Variety of Problems to Choose From
Intelligent Document Management
Historical Reporting and Trend Analysis
Market Monitoring
Competitive Intelligence
Defining the Business Objectives
Getting What You Want from Your Text
Answering the Right Business Questions
Determining Who Will Use the System
Extracting the Right Information for Future Processing and Searching
Classifying Documents for Browsing
Setting the Scope
Time Requirements
Space Requirements and Sizing the Document Warehouse
Creating the Document Warehouse Project Plan
Design and Development
Conclusions
Designing the Document Warehouse Architecture
Document Sources
File Servers
Document Management Systems
Internet Resources
From Document Sources to Text Analysis
Text Processing Servers
Using Crawlers and Agents to Retrieve Documents
Text Analysis Services
Document Warehouse Storage Options
Database Options
The Metadata Repository and Document Data Model
Document Content Metadata
Search and Retrieval Metadata
Text Mining Metadata
Storage Metadata
Document Data Model
User Profiles and End-User Support
End-User Profiles
Data Warehouse and Data Mart Integration
Linking Numbers and Text
Integration Heuristics
Conclusions
Finding and Retrieving Relevant Text
Manual Retrieval Methods
Search Tools
Automatic Retrieval Methods
Data-Driven Searching
Searching Internal Networks
Configuring Crawlers
Batch versus Interactive Retrieval
Retrieving from Document Processing Systems
Tradeoffs between Manual and Automatic Retrieval
Precision
Recall
Cost
Effectiveness
Text Management Issues
Avoiding Duplication
Accommodating Document Revisions and Versioning
Assessing the Reliability of a Source
Improving Performance
Representing Users' Areas of Interest
Data Store for Interest Specifications
Creating Interest Specifications
Interest Specifications Drive Searching
Prototype-Driven Searching
Conclusions
Loading and Transforming Documents
Internationalization and Character Set Issues
Coded Character Sets
Translating Documents
Indexing Text
Full Text Indexing
Thematic Indexing
Document Classification
Labeling
Multidimensional Taxonomies
Document Clustering
Binary Relational Clustering
Hierarchical Clustering
Self-Organizing Map Clustering
Summarizing Text
Basic Summarization Methods
Dealing with Large Documents
Conclusions
Managing Document Warehouse Metadata
Metadata Standards
Common Warehouse Model
Knowledge Management Based on the Open Information Model
Dublin Core
Adapting Metadata Standards to Document Warehousing
Content Metadata
Technical Metadata
Controlling Document Loads in the Warehouse
Prioritizing Items in Multiple Processing Queues
Summarizing Documents
Business Metadata
Quality: Timeliness and Reliability
Access Control
Versioning
Conclusions
Ensuring Document Warehouse Integrity
Information Stewardship and Quality Control
Document Search and Retrieval
Text Analysis
Content Validation
Security
File System Security
Database Roles and Privileges
Programmatic Access Control
Virtual Database Security
Privacy
Contracts between Document Owners and the Warehouse
Is Privacy the Third Rail of Business Intelligence? Protecting Individuals and Organizations
Conclusions
Choosing Tools for Building the Document Warehouse
Choosing Text Analysis Tools
Statistical/Heuristic Approach
The Knowledge-Based Approach
Neural Network Approach: Megaputer's TextAnalyst
There Is More Than One Way to Mine Text
Choosing Supplemental Tools
Choosing Web Document Retrieval Tools
Conclusions
Developing a Document Warehouse: A Checklist
What Should Be Stored?
Understanding User Needs
Defining Document Sources
Metadata
User Profiles
Integration with the Data Warehouse
Where Should It Be Stored?
What Text Mining Services Should Be Used?
Indexing Services
Feature Extraction
Summarization
Document Clustering
Question Answering
Classification and Routing
Building Taxonomies and Thesauri
How Should the Warehouse Be Populated?
Crawlers
Searching
How Should the Warehouse Be Maintained?
Conclusions
Text Mining
What Is Text Mining?
Defining Text Mining
Foundations of Text Mining
Information Retrieval
Computational Linguistics and Natural Language Processing
Discovering Knowledge in Text: Example Cases
Text Mining Methodology: Using the Cross-Industry Process Model for Data Mining
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Adopting the CRISP-DM to Text Mining
Text Mining Applications
Knowing Your Business
Knowing Your Customer
Knowing Your Competition and Market
Conclusions
Know Thyself: Using Text Mining for Operational Management
Operations and Projects: Understanding the Distinct Needs of Each
Enterprise Document Management Systems
Benefits of Enterprise Document Management Systems
Limits of Enterprise Document Management Systems
Integrating Document Management with Document Warehousing
Document Extraction
Steps to Effective Text Mining for Operational Management
Specifying a Process for Extracting Information
Meeting Wide-Ranging Organizational Needs
Conclusions
Knowing Your Business-to-Business Customer: Text Mining for Customer Relationship Management
Understanding Your Customer's Market
Developing a Customer Intelligence Profile
Sample Case of B-to-B Customer Relationship Management
Getting the Information I: Internal Sources
Getting the Information II: External Documents
Collecting External Documents
Preliminary Document Analysis
Conclusions
Text Mining for Competitive Intelligence
Competitive Intelligence versus Business Intelligence
Competitive Intelligence Profiles
Identifying Information Sources
XML Text Processing Operations
XML Interface Models
Getting Financial Information from XBRL Documents
The Practice of Competitive Intelligence
Competitive Intelligence in Health Care: Patent Analysis
Competitive Intelligence in Manufacturing: Financial Analysis
Competitive Intelligence in Financial Services Market: Market Issue Analysis
Conclusions
Text Mining Tools
Criteria for Choosing Tools
Preprocessing Tools
Text Mining Tool Selection
Still Looking for a Silver Bullet: The Limits of Text Mining
Discourse Analysis
Semantic Models
Conclusions
Conclusions
Changes in Business Intelligence
Business Intelligence and the Dynamics of Organizations
Changing Decision Makers
Changing Technologies
Changing Strategies
Meeting BI Needs with the Document Warehouse and Text Mining
The Process of Document Warehousing
Text Mining for Decision Support
Text Mining and Operational Management
Text Mining and Customer Relationship Management
Text Mining and Competitive Intelligence
Shifting Emphasis of BI
Text, Not Just Numbers
Heuristics, Not Just Algorithms
Distributed Intelligence, Centralized Management
Next Steps: Where Do We Go from Here?
Conclusions
Templates
Tools and Resources
Basic Document Warehouse Data Model
Bibliography
Glossary
Index