| |
| |
Acknowledgments | |
| |
| |
Preface | |
| |
| |
Text Analysis for Business Intelligence | |
| |
| |
Expanding the Scope of Business Intelligence | |
| |
| |
The Need to Deal with Text | |
| |
| |
Growth of Textual Information--The Good News | |
| |
| |
Growth of Textual Information--The Bad News | |
| |
| |
Finding Information: It's Not as Easy as It Used to Be | |
| |
| |
Beware What You Wish for: Finding Too Much Information | |
| |
| |
The Document Warehousing Approach to the Information Glut | |
| |
| |
Supporting Business Intelligence with Text | |
| |
| |
Defining the Document Warehouse | |
| |
| |
The Role of Text Mining in Document Warehousing | |
| |
| |
Building the Document Warehouse | |
| |
| |
Benefits of Document Warehousing | |
| |
| |
Conclusions | |
| |
| |
Understanding the Structure of Text: The Foundation of Text-Based Business Intelligence | |
| |
| |
The Myth of Unstructured Texts | |
| |
| |
Natural Structures: It's All in Your Head | |
| |
| |
The Building Blocks of Language | |
| |
| |
Working with Statistical Techniques | |
| |
| |
Macrostructures: Introducing Artificial Structures in Documents | |
| |
| |
Hierarchical Conventions from Words to Documents | |
| |
| |
The Jewel in the Crown: Markup Languages for Arbitrary Structure | |
| |
| |
It Isn't So Linear After All: Hypertext | |
| |
| |
Conclusions | |
| |
| |
Exploiting the Structure of Text | |
| |
| |
Text-Oriented Business Intelligence Operations | |
| |
| |
Summarizing Documents | |
| |
| |
Classifying and Routing Documents | |
| |
| |
Answering Questions | |
| |
| |
Searching and Browsing by Topic and Theme | |
| |
| |
Searching by Example | |
| |
| |
Text-Oriented Business Intelligence Techniques | |
| |
| |
Full Text Searching: Text Processing 101 | |
| |
| |
Undirected Summarization | |
| |
| |
Document Clustering | |
| |
| |
Integration with the Data Warehouse | |
| |
| |
Dimensional Models: A Quick Refresher | |
| |
| |
Integration with the World Wide Web | |
| |
| |
Adapting to Changing Users' Interests | |
| |
| |
Conclusions | |
| |
| |
Document Warehousing | |
| |
| |
Overview of Document Warehousing | |
| |
| |
Meeting Business Intelligence Requirements | |
| |
| |
Who Are the End Users? | |
| |
| |
What Information Is Needed? | |
| |
| |
When Is It Needed? | |
| |
| |
Where Is the Information Found? | |
| |
| |
The Role of the Document Warehouse in Business Intelligence | |
| |
| |
The Architecture of the Document Warehouse | |
| |
| |
Document Sources | |
| |
| |
Text Processing Servers | |
| |
| |
Text Databases and Other Storage Options | |
| |
| |
Metadata Repositories | |
| |
| |
User Profiling | |
| |
| |
The Process of Document Warehousing | |
| |
| |
Identifying Document Sources | |
| |
| |
Document Retrieval | |
| |
| |
Preprocessing Operations | |
| |
| |
Text Analysis Operations | |
| |
| |
Managing the Document Warehouse | |
| |
| |
Supporting End-User Operations | |
| |
| |
Conclusions | |
| |
| |
Meeting Business Intelligence Requirements: More Than Just Numbers | |
| |
| |
A Variety of Problems to Choose From | |
| |
| |
Intelligent Document Management | |
| |
| |
Historical Reporting and Trend Analysis | |
| |
| |
Market Monitoring | |
| |
| |
Competitive Intelligence | |
| |
| |
Defining the Business Objectives | |
| |
| |
Getting What You Want from Your Text | |
| |
| |
Answering the Right Business Questions | |
| |
| |
Determining Who Will Use the System | |
| |
| |
Extracting the Right Information for Future Processing and Searching | |
| |
| |
Classifying Documents for Browsing | |
| |
| |
Setting the Scope | |
| |
| |
Time Requirements | |
| |
| |
Space Requirements and Sizing the Document Warehouse | |
| |
| |
Creating the Document Warehouse Project Plan | |
| |
| |
Design and Development | |
| |
| |
Conclusions | |
| |
| |
Designing the Document Warehouse Architecture | |
| |
| |
Document Sources | |
| |
| |
File Servers | |
| |
| |
Document Management Systems | |
| |
| |
Internet Resources | |
| |
| |
From Document Sources to Text Analysis | |
| |
| |
Text Processing Servers | |
| |
| |
Using Crawlers and Agents to Retrieve Documents | |
| |
| |
Text Analysis Services | |
| |
| |
Document Warehouse Storage Options | |
| |
| |
Database Options | |
| |
| |
The Metadata Repository and Document Data Model | |
| |
| |
Document Content Metadata | |
| |
| |
Search and Retrieval Metadata | |
| |
| |
Text Mining Metadata | |
| |
| |
Storage Metadata | |
| |
| |
Document Data Model | |
| |
| |
User Profiles and End-User Support | |
| |
| |
End-User Profiles | |
| |
| |
Data Warehouse and Data Mart Integration | |
| |
| |
Linking Numbers and Text | |
| |
| |
Integration Heuristics | |
| |
| |
Conclusions | |
| |
| |
Finding and Retrieving Relevant Text | |
| |
| |
Manual Retrieval Methods | |
| |
| |
Search Tools | |
| |
| |
Automatic Retrieval Methods | |
| |
| |
Data-Driven Searching | |
| |
| |
Searching Internal Networks | |
| |
| |
Configuring Crawlers | |
| |
| |
Batch versus Interactive Retrieval | |
| |
| |
Retrieving from Document Processing Systems | |
| |
| |
Tradeoffs between Manual and Automatic Retrieval | |
| |
| |
Precision | |
| |
| |
Recall | |
| |
| |
Cost | |
| |
| |
Effectiveness | |
| |
| |
Text Management Issues | |
| |
| |
Avoiding Duplication | |
| |
| |
Accommodating Document Revisions and Versioning | |
| |
| |
Assessing the Reliability of a Source | |
| |
| |
Improving Performance | |
| |
| |
Representing Users' Areas of Interest | |
| |
| |
Data Store for Interest Specifications | |
| |
| |
Creating Interest Specifications | |
| |
| |
Interest Specifications Drive Searching | |
| |
| |
Prototype-Driven Searching | |
| |
| |
Conclusions | |
| |
| |
Loading and Transforming Documents | |
| |
| |
Internationalization and Character Set Issues | |
| |
| |
Coded Character Sets | |
| |
| |
Translating Documents | |
| |
| |
Indexing Text | |
| |
| |
Full Text Indexing | |
| |
| |
Thematic Indexing | |
| |
| |
Document Classification | |
| |
| |
Labeling | |
| |
| |
Multidimensional Taxonomies | |
| |
| |
Document Clustering | |
| |
| |
Binary Relational Clustering | |
| |
| |
Hierarchical Clustering | |
| |
| |
Self-Organizing Map Clustering | |
| |
| |
Summarizing Text | |
| |
| |
Basic Summarization Methods | |
| |
| |
Dealing with Large Documents | |
| |
| |
Conclusions | |
| |
| |
Managing Document Warehouse Metadata | |
| |
| |
Metadata Standards | |
| |
| |
Common Warehouse Model | |
| |
| |
Knowledge Management Based on the Open Information Model | |
| |
| |
Dublin Core | |
| |
| |
Adapting Metadata Standards to Document Warehousing | |
| |
| |
Content Metadata | |
| |
| |
Technical Metadata | |
| |
| |
Controlling Document Loads in the Warehouse | |
| |
| |
Prioritizing Items in Multiple Processing Queues | |
| |
| |
Summarizing Documents | |
| |
| |
Business Metadata | |
| |
| |
Quality: Timeliness and Reliability | |
| |
| |
Access Control | |
| |
| |
Versioning | |
| |
| |
Conclusions | |
| |
| |
Ensuring Document Warehouse Integrity | |
| |
| |
Information Stewardship and Quality Control | |
| |
| |
Document Search and Retrieval | |
| |
| |
Text Analysis | |
| |
| |
Content Validation | |
| |
| |
Security | |
| |
| |
File System Security | |
| |
| |
Database Roles and Privileges | |
| |
| |
Programmatic Access Control | |
| |
| |
Virtual Database Security | |
| |
| |
Privacy | |
| |
| |
Contracts between Document Owners and the Warehouse | |
| |
| |
Is Privacy the Third Rail of Business Intelligence? Protecting Individuals and Organizations | |
| |
| |
Conclusions | |
| |
| |
Choosing Tools for Building the Document Warehouse | |
| |
| |
Choosing Text Analysis Tools | |
| |
| |
Statistical/Heuristic Approach | |
| |
| |
The Knowledge-Based Approach | |
| |
| |
Neural Network Approach: Megaputer's TextAnalyst | |
| |
| |
There Is More Than One Way to Mine Text | |
| |
| |
Choosing Supplemental Tools | |
| |
| |
Choosing Web Document Retrieval Tools | |
| |
| |
Conclusions | |
| |
| |
Developing a Document Warehouse: A Checklist | |
| |
| |
What Should Be Stored? | |
| |
| |
Understanding User Needs | |
| |
| |
Defining Document Sources | |
| |
| |
Metadata | |
| |
| |
User Profiles | |
| |
| |
Integration with the Data Warehouse | |
| |
| |
Where Should It Be Stored? | |
| |
| |
What Text Mining Services Should Be Used? | |
| |
| |
Indexing Services | |
| |
| |
Feature Extraction | |
| |
| |
Summarization | |
| |
| |
Document Clustering | |
| |
| |
Question Answering | |
| |
| |
Classification and Routing | |
| |
| |
Building Taxonomies and Thesauri | |
| |
| |
How Should the Warehouse Be Populated? | |
| |
| |
Crawlers | |
| |
| |
Searching | |
| |
| |
How Should the Warehouse Be Maintained? | |
| |
| |
Conclusions | |
| |
| |
Text Mining | |
| |
| |
What Is Text Mining? | |
| |
| |
Defining Text Mining | |
| |
| |
Foundations of Text Mining | |
| |
| |
Information Retrieval | |
| |
| |
Computational Linguistics and Natural Language Processing | |
| |
| |
Discovering Knowledge in Text: Example Cases | |
| |
| |
Text Mining Methodology: Using the Cross-Industry Process Model for Data Mining | |
| |
| |
Business Understanding | |
| |
| |
Data Understanding | |
| |
| |
Data Preparation | |
| |
| |
Modeling | |
| |
| |
Evaluation | |
| |
| |
Deployment | |
| |
| |
Adopting the CRISP-DM to Text Mining | |
| |
| |
Text Mining Applications | |
| |
| |
Knowing Your Business | |
| |
| |
Knowing Your Customer | |
| |
| |
Knowing Your Competition and Market | |
| |
| |
Conclusions | |
| |
| |
Know Thyself: Using Text Mining for Operational Management | |
| |
| |
Operations and Projects: Understanding the Distinct Needs of Each | |
| |
| |
Enterprise Document Management Systems | |
| |
| |
Benefits of Enterprise Document Management Systems | |
| |
| |
Limits of Enterprise Document Management Systems | |
| |
| |
Integrating Document Management with Document Warehousing | |
| |
| |
Document Extraction | |
| |
| |
Steps to Effective Text Mining for Operational Management | |
| |
| |
Specifying a Process for Extracting Information | |
| |
| |
Meeting Wide-Ranging Organizational Needs | |
| |
| |
Conclusions | |
| |
| |
Knowing Your Business-to-Business Customer: Text Mining for Customer Relationship Management | |
| |
| |
Understanding Your Customer's Market | |
| |
| |
Developing a Customer Intelligence Profile | |
| |
| |
Sample Case of B-to-B Customer Relationship Management | |
| |
| |
Getting the Information I: Internal Sources | |
| |
| |
Getting the Information II: External Documents | |
| |
| |
Collecting External Documents | |
| |
| |
Preliminary Document Analysis | |
| |
| |
Conclusions | |
| |
| |
Text Mining for Competitive Intelligence | |
| |
| |
Competitive Intelligence versus Business Intelligence | |
| |
| |
Competitive Intelligence Profiles | |
| |
| |
Identifying Information Sources | |
| |
| |
XML Text Processing Operations | |
| |
| |
XML Interface Models | |
| |
| |
Getting Financial Information from XBRL Documents | |
| |
| |
The Practice of Competitive Intelligence | |
| |
| |
Competitive Intelligence in Health Care: Patent Analysis | |
| |
| |
Competitive Intelligence in Manufacturing: Financial Analysis | |
| |
| |
Competitive Intelligence in Financial Services Market: Market Issue Analysis | |
| |
| |
Conclusions | |
| |
| |
Text Mining Tools | |
| |
| |
Criteria for Choosing Tools | |
| |
| |
Preprocessing Tools | |
| |
| |
Text Mining Tool Selection | |
| |
| |
Still Looking for a Silver Bullet: The Limits of Text Mining | |
| |
| |
Discourse Analysis | |
| |
| |
Semantic Models | |
| |
| |
Conclusions | |
| |
| |
Conclusions | |
| |
| |
Changes in Business Intelligence | |
| |
| |
Business Intelligence and the Dynamics of Organizations | |
| |
| |
Changing Decision Makers | |
| |
| |
Changing Technologies | |
| |
| |
Changing Strategies | |
| |
| |
Meeting BI Needs with the Document Warehouse and Text Mining | |
| |
| |
The Process of Document Warehousing | |
| |
| |
Text Mining for Decision Support | |
| |
| |
Text Mining and Operational Management | |
| |
| |
Text Mining and Customer Relationship Management | |
| |
| |
Text Mining and Competitive Intelligence | |
| |
| |
Shifting Emphasis of BI | |
| |
| |
Text, Not Just Numbers | |
| |
| |
Heuristics, Not Just Algorithms | |
| |
| |
Distributed Intelligence, Centralized Management | |
| |
| |
Next Steps: Where Do We Go from Here? | |
| |
| |
Conclusions | |
| |
| |
Templates | |
| |
| |
Tools and Resources | |
| |
| |
Basic Document Warehouse Data Model | |
| |
| |
Bibliography | |
| |
| |
Glossary | |
| |
| |
Index | |