Skip to content

Hadoop Operations A Guide for Developers and Administrators

Spend $50 to get a free DVD!

ISBN-10: 1449327052

ISBN-13: 9781449327057

Edition: 2012

Authors: Eric Sammer

List price: $39.99
Blue ribbon 30 day, 100% satisfaction guarantee!
Out of stock
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!


If you’ve been tasked with the job of maintaining large and complex Hadoop clusters, or are about to be, this book is a must. You’ll learn the particulars of Hadoop operations, from planning, installing, and configuring the system to providing ongoing maintenance.Hadoop is being adopted by more and more Fortune 500 companies, and the demand for operations-specific material has skyrocketed. This book—written by Eric Sammer, Principal Solution Architect at Cloudera—is the definitive operations guide for administrators.Developers who want to improve MapReduce jobs by learning how Hadoop works in large production environments will also benefit. Application administrators responsible for the…    
Customers also bought

Book details

List price: $39.99
Copyright year: 2012
Publisher: O'Reilly Media, Incorporated
Publication date: 10/23/2012
Binding: Paperback
Pages: 298
Size: 7.00" wide x 9.00" long x 0.75" tall
Weight: 1.034
Language: English

Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He's been involved in the open source community and has contributed to a large number of projects over the last decade.

Goals and Motivation
Reading and Writing Data
The Read Path
The Write Path
Managing Filesystem Metadata
Namenode High Availability
Namenode Federation
Access and Integration
Command-Line Tools
REST Support
The Stages of MapReduce
Introducing Hadoop MapReduce
When It All Goes Wrong
Planning a Hadoop Cluster
Picking a Distribution and Version of Hadoop
Apache Hadoop
Cloudera's Distribution Including Apache Hadoop
Versions and Features
What Should I Use?
Hardware Selection
Master Hardware Selection
Worker Hardware Selection
Cluster Sizing
Blades, SANs, and Virtualization
Operating System Selection and Preparation
Deployment Layout
Hostnames, DNS, and Identification
Users, Groups, and Privileges
Kernel Tuning
Disk Configuration
Choosing a Filesystem
Mount Options
Network Design
Network Usage in Hadoop: A Review
1 Gb versus 10 Gb Networks
Typical Network Topologies
Installation and Configuration
Installing Hadoop
Apache Hadoop
Configuration: An Overview
The Hadoop XML Configuration Files
Environment Variables and Shell Scripts
Logging Configuration
Identification and Location
Optimization and Tuning
Formatting the Namenode
Creating a /tmp Directory
Namenode High Availability
Fencing Options
Basic Configuration
Automatic Failover Configuration
Format and Bootstrap the Namenodes
Namenode Federation
Identification and Location
Optimization and Tuning
Rack Topology
Identity, Authentication, and Authorization
Kerberos and Hadoop
Kerberos: A Refresher
Kerberos Support in Hadoop
Other Tools and Systems
Tying It Together
Resource Management
What Is Resource Management?
HDFS Quotas
MapReduce Schedulers
The FIFO Scheduler
The Fair Scheduler
The Capacity Scheduler
The Future
Cluster Maintenance
Managing Hadoop Processes
Starting and Stopping Processes with Init Scripts
Starting and Stopping Processes Manually
HDFS Maintenance Tasks
Adding a Datanode
Decommissioning a Datanode
Checking Filesystem Integrity with fsck
Balancing HDFS Block Data
Dealing with a Failed Disk
MapReduce Maintenance Tasks
Adding a Tasktracker
Decommissioning a Tasktracker
Killing a MapReduce Job
Killing a MapReduce Task
Dealing with a Blacklisted Tasktracker
Differential Diagnosis Applied to Systems
Common Failures and Problems
Humans (You)
Hardware Failure
Resource Exhaustion
Host Identification and Naming
Network Partitions
"Is the Computer Plugged In?"
Treatment and Care
War Stories
A Mystery Bottleneck
There's No Place Like
An Overview
Hadoop Metrics
Apache Hadoop 0.20.0 and CDH3 (metrics1)
Apache Hadoop 0.20.203 and Later, and CDH4 (metrics 2)
What about SNMP?
Health Monitoring
Host-Level Checks
All Hadoop Processes
HDFS Checks
MapReduce Checks
Backup and Recovery
Data Backup
Distributed Copy (distcp)
Parallel Data Ingestion
Namenode Metadata
Appendix: Deprecated Configuration Properties