Agile Data Science Building Data Analytics Applications with Hadoop

Name: Agile Data Science Building Data Analytics Applications with Hadoop
Price: 6.75 USD
Availability: InStock
ISBN: 9781449326265

ISBN-10: 1449326269

ISBN-13: 9781449326265

Edition: 2012

Authors: Russell Jurney

List price: $39.99

30 day, 100% satisfaction guarantee!

Marketplace

2 new & used from $6.75

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

Mining data requires a deep investment in people and time. How can you be sure you’re building the right models? What tools help you connect with the customer’s needs? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications.Agile Datashows you how to create an environment for exploring data, using lightweight tools such as Ruby, Python, Apache Pig, and the D3.js (Data-Driven Documents) JavaScript library. You’ll learn an iterative approach that allows you to quickly change the kind of analysis you’re doing, as you discover what the data is telling you. All the example code in this book is available as working Heroku apps.Build…

Book details

List price: $39.99
Copyright year: 2012
Publisher: O'Reilly Media, Incorporated
Publication date: 11/12/2013
Binding: Paperback
Pages: 178
Size: 7.00" wide x 9.19" long x 0.38" tall
Weight: 0.660
Language: English

Russell Jurney cut his data teeth in casino gaming, building web apps to analyze the performance of slot machines in the US and Mexico. After dabbling in entrepreneurship, interactive media and journalism, he moved to silicon valley to build analytics applications at scale at Ning and LinkedIn. He lives on the ocean in Pacifica, California with his wife Kate and two fuzzy dogs.



Preface



Setup



Theory


Agile Big Data


Big Words Defined


Agile Big Data Teams


Recognizing the Opportunity and Problem


Adapting to Change


Agile Big Data Process


Code Review and Pair Programming


Agile Environments: Engineering Productivity


Collaboration Space


Private Space


Personal Space


Realizing Ideas with Large-Format Printing



Data


Email


Working with Raw Data


Raw Email


Structured Versus Semistructured Data


SQL


NoSQL


Serialization


Extracting and Exposing Features in Evolving Schemas


Data Pipelines


Data Perspectives


Networks


Time Series


Natural Language


Probability


Conclusion



Agile Tools


Scalability = Simplicity


Agile Big Data Processing


Setting Up a Virtual Environment for Python


Serializing Events with Avro


Avro for Python


Collecting Data


Data Processing with Pig


Installing Pig


Publishing Data with MongoDB


Installing MongoDB


Installing MongoDB's Java Driver


Installing mongo-hadoop


Pushing Data to MongoDB from Pig


Searching Data with ElasticSearch


Installation


ElasticSearch and Pig with Wonderdog


Reflecting on our Workflow


Lightweight Web Applications


Python and Flask


Presenting Our Data


Installing Bootstrap


Booting Boostrap


Visualizing Data with D3.js and nvd3.js


Conclusion



To the Cloud!


Introduction


GitHub


dotCloud


Echo on dotCloud


Python Workers


Amazon Web Services


Simple Storage Service


Elastic MapReduce


MongoDB as a Service


Instrumentation


Google Analytics


Mortar Data



Climbing the Pyramid



Collecting and Displaying Records


Putting It All Together


Collect and Serialize Our Inbox


Process and Publish Our Emails


Presenting Emails in a Browser


Serving Emails with Flask and pymongo


Rendering HTML5 with Jinja2


Agile Checkpoint


Listing Emails


Listing Emails with MongoDB


Anatomy of a Presentation


Searching Our Email


Indexing Our Email with Pig, ElasticSearch, and Wonderdog


Searching Our Email on the Web


Conclusion



Visualizing Data with Charts


Good Charts


Extracting Entities: Email Addresses


Extracting Emails


Visualizing Time


Conclusion



Exploring Data with Reports


Building Reports with Multiple Charts


Linking Records


Extracting Keywords from Emails with TF-IDF


Conclusion



Making Predictions


Predicting Response Rates to Emails


Personalization


Conclusion



Driving Actions


Properties of Successful Emails


Better Predictions with Naive Bayes


P(Reply From & To)


P(Reply Token)


Making Predictions in Real Time


Logging Events


Conclusion


Index