| |
| |
Preface | |
| |
| |
| |
Setup | |
| |
| |
| |
Theory | |
| |
| |
Agile Big Data | |
| |
| |
Big Words Defined | |
| |
| |
Agile Big Data Teams | |
| |
| |
Recognizing the Opportunity and Problem | |
| |
| |
Adapting to Change | |
| |
| |
Agile Big Data Process | |
| |
| |
Code Review and Pair Programming | |
| |
| |
Agile Environments: Engineering Productivity | |
| |
| |
Collaboration Space | |
| |
| |
Private Space | |
| |
| |
Personal Space | |
| |
| |
Realizing Ideas with Large-Format Printing | |
| |
| |
| |
Data | |
| |
| |
Email | |
| |
| |
Working with Raw Data | |
| |
| |
Raw Email | |
| |
| |
Structured Versus Semistructured Data | |
| |
| |
SQL | |
| |
| |
NoSQL | |
| |
| |
Serialization | |
| |
| |
Extracting and Exposing Features in Evolving Schemas | |
| |
| |
Data Pipelines | |
| |
| |
Data Perspectives | |
| |
| |
Networks | |
| |
| |
Time Series | |
| |
| |
Natural Language | |
| |
| |
Probability | |
| |
| |
Conclusion | |
| |
| |
| |
Agile Tools | |
| |
| |
Scalability = Simplicity | |
| |
| |
Agile Big Data Processing | |
| |
| |
Setting Up a Virtual Environment for Python | |
| |
| |
Serializing Events with Avro | |
| |
| |
Avro for Python | |
| |
| |
Collecting Data | |
| |
| |
Data Processing with Pig | |
| |
| |
Installing Pig | |
| |
| |
Publishing Data with MongoDB | |
| |
| |
Installing MongoDB | |
| |
| |
Installing MongoDB's Java Driver | |
| |
| |
Installing mongo-hadoop | |
| |
| |
Pushing Data to MongoDB from Pig | |
| |
| |
Searching Data with ElasticSearch | |
| |
| |
Installation | |
| |
| |
ElasticSearch and Pig with Wonderdog | |
| |
| |
Reflecting on our Workflow | |
| |
| |
Lightweight Web Applications | |
| |
| |
Python and Flask | |
| |
| |
Presenting Our Data | |
| |
| |
Installing Bootstrap | |
| |
| |
Booting Boostrap | |
| |
| |
Visualizing Data with D3.js and nvd3.js | |
| |
| |
Conclusion | |
| |
| |
| |
To the Cloud! | |
| |
| |
Introduction | |
| |
| |
GitHub | |
| |
| |
dotCloud | |
| |
| |
Echo on dotCloud | |
| |
| |
Python Workers | |
| |
| |
Amazon Web Services | |
| |
| |
Simple Storage Service | |
| |
| |
Elastic MapReduce | |
| |
| |
MongoDB as a Service | |
| |
| |
Instrumentation | |
| |
| |
Google Analytics | |
| |
| |
Mortar Data | |
| |
| |
| |
Climbing the Pyramid | |
| |
| |
| |
Collecting and Displaying Records | |
| |
| |
Putting It All Together | |
| |
| |
Collect and Serialize Our Inbox | |
| |
| |
Process and Publish Our Emails | |
| |
| |
Presenting Emails in a Browser | |
| |
| |
Serving Emails with Flask and pymongo | |
| |
| |
Rendering HTML5 with Jinja2 | |
| |
| |
Agile Checkpoint | |
| |
| |
Listing Emails | |
| |
| |
Listing Emails with MongoDB | |
| |
| |
Anatomy of a Presentation | |
| |
| |
Searching Our Email | |
| |
| |
Indexing Our Email with Pig, ElasticSearch, and Wonderdog | |
| |
| |
Searching Our Email on the Web | |
| |
| |
Conclusion | |
| |
| |
| |
Visualizing Data with Charts | |
| |
| |
Good Charts | |
| |
| |
Extracting Entities: Email Addresses | |
| |
| |
Extracting Emails | |
| |
| |
Visualizing Time | |
| |
| |
Conclusion | |
| |
| |
| |
Exploring Data with Reports | |
| |
| |
Building Reports with Multiple Charts | |
| |
| |
Linking Records | |
| |
| |
Extracting Keywords from Emails with TF-IDF | |
| |
| |
Conclusion | |
| |
| |
| |
Making Predictions | |
| |
| |
Predicting Response Rates to Emails | |
| |
| |
Personalization | |
| |
| |
Conclusion | |
| |
| |
| |
Driving Actions | |
| |
| |
Properties of Successful Emails | |
| |
| |
Better Predictions with Naive Bayes | |
| |
| |
P(Reply From & To) | |
| |
| |
P(Reply Token) | |
| |
| |
Making Predictions in Real Time | |
| |
| |
Logging Events | |
| |
| |
Conclusion | |
| |
| |
Index | |