Skip to content

Hadoop Operations A Guide for Developers and Administrators

Best in textbook rentals since 2012!

ISBN-10: 1449327052

ISBN-13: 9781449327057

Edition: 2012

Authors: Eric Sammer

List price: $39.99
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

If you’ve been tasked with the job of maintaining large and complex Hadoop clusters, or are about to be, this book is a must. You’ll learn the particulars of Hadoop operations, from planning, installing, and configuring the system to providing ongoing maintenance.Hadoop is being adopted by more and more Fortune 500 companies, and the demand for operations-specific material has skyrocketed. This book—written by Eric Sammer, Principal Solution Architect at Cloudera—is the definitive operations guide for administrators.Developers who want to improve MapReduce jobs by learning how Hadoop works in large production environments will also benefit. Application administrators responsible for the…    
Customers also bought

Book details

List price: $39.99
Copyright year: 2012
Publisher: O'Reilly Media, Incorporated
Publication date: 10/12/2012
Binding: Paperback
Pages: 298
Size: 6.97" wide x 9.17" long x 0.71" tall
Weight: 1.034
Language: English

Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He's been involved in the open source community and has contributed to a large number of projects over the last decade.

Preface
Introduction
HDFS
Goals and Motivation
Design
Daemons
Reading and Writing Data
The Read Path
The Write Path
Managing Filesystem Metadata
Namenode High Availability
Namenode Federation
Access and Integration
Command-Line Tools
FUSE
REST Support
MapReduce
The Stages of MapReduce
Introducing Hadoop MapReduce
Daemons
When It All Goes Wrong
YARN
Planning a Hadoop Cluster
Picking a Distribution and Version of Hadoop
Apache Hadoop
Cloudera's Distribution Including Apache Hadoop
Versions and Features
What Should I Use?
Hardware Selection
Master Hardware Selection
Worker Hardware Selection
Cluster Sizing
Blades, SANs, and Virtualization
Operating System Selection and Preparation
Deployment Layout
Software
Hostnames, DNS, and Identification
Users, Groups, and Privileges
Kernel Tuning
vm.swappiness
vm.overcommit_memory
Disk Configuration
Choosing a Filesystem
Mount Options
Network Design
Network Usage in Hadoop: A Review
1 Gb versus 10 Gb Networks
Typical Network Topologies
Installation and Configuration
Installing Hadoop
Apache Hadoop
CDH
Configuration: An Overview
The Hadoop XML Configuration Files
Environment Variables and Shell Scripts
Logging Configuration
HDFS
Identification and Location
Optimization and Tuning
Formatting the Namenode
Creating a /tmp Directory
Namenode High Availability
Fencing Options
Basic Configuration
Automatic Failover Configuration
Format and Bootstrap the Namenodes
Namenode Federation
MapReduce
Identification and Location
Optimization and Tuning
Rack Topology
Security
Identity, Authentication, and Authorization
Identity
Kerberos and Hadoop
Kerberos: A Refresher
Kerberos Support in Hadoop
Authorization
HDFS
MapReduce
Other Tools and Systems
Tying It Together
Resource Management
What Is Resource Management?
HDFS Quotas
MapReduce Schedulers
The FIFO Scheduler
The Fair Scheduler
The Capacity Scheduler
The Future
Cluster Maintenance
Managing Hadoop Processes
Starting and Stopping Processes with Init Scripts
Starting and Stopping Processes Manually
HDFS Maintenance Tasks
Adding a Datanode
Decommissioning a Datanode
Checking Filesystem Integrity with fsck
Balancing HDFS Block Data
Dealing with a Failed Disk
MapReduce Maintenance Tasks
Adding a Tasktracker
Decommissioning a Tasktracker
Killing a MapReduce Job
Killing a MapReduce Task
Dealing with a Blacklisted Tasktracker
Troubleshooting
Differential Diagnosis Applied to Systems
Common Failures and Problems
Humans (You)
Misconfiguration
Hardware Failure
Resource Exhaustion
Host Identification and Naming
Network Partitions
"Is the Computer Plugged In?"
E-SPORE
Treatment and Care
War Stories
A Mystery Bottleneck
There's No Place Like 127.0.0.1
Monitoring
An Overview
Hadoop Metrics
Apache Hadoop 0.20.0 and CDH3 (metrics1)
Apache Hadoop 0.20.203 and Later, and CDH4 (metrics 2)
What about SNMP?
Health Monitoring
Host-Level Checks
All Hadoop Processes
HDFS Checks
MapReduce Checks
Backup and Recovery
Data Backup
Distributed Copy (distcp)
Parallel Data Ingestion
Namenode Metadata
Appendix: Deprecated Configuration Properties
Index