At a glance
View schedule & enroll Sorted by: location or date
Course number H6C60S
Length 3 days
Delivery method Virtual Instructor-Led Training (VILT)
Instructor-led training (ILT)
Onsite dedicated training (OST)
Price USD $2,400
CAD $2,640
*Courses are supported in the delivery formats above, but are not necessarily scheduled in every delivery format listed. Please click the schedule links at the top of the page to see which delivery formats are currently scheduled.

Course overview

This course covers the essentials of deploying and managing an Apache™ Hadoop® cluster. The course is lab intensive with each participant creating their own Hadoop cluster using either the CDH (Cloudera's Distribution, including Apache Hadoop) or Hortonworks Data Platform stacks. Core Hadoop services are explored in depth with emphasis on troubleshooting and recovering from common cluster failures. The fundamentals of related services such as Ambari, Zookeeper, Pig, Hive, HBase, Sqoop, Flume, and Oozie are also covered. The course is approximately 60% lecture and 40% labs.


Prerequisites

  • Qualified participants should be comfortable with the Linux commands and have some systems administration experience, but do not need previous Hadoop experience

Audience

  • Systems Administrators who will be responsible for managing and administering Hadoop clusters

Ways to save

Next steps

Benefits to you

  • Hands-on coverage of Hadoop gives systems administrators the skills they need to properly deploy, manage, and maintain Hadoop clusters

Course outline

"Big Data", the big picture

  • Distributed processing and data locality
  • Hadoop core architecture:
    • HDFS
    • MapReduce
  • Hadoop distributions:
    • Cloudera, MapR, Hortonworks
  • Hadoop ecosystem:
    • Ambari, Pig, Hive, Zookeeper, HBase, Sqoop, Flume, Oozie

HDFS

  • Design and operation:
    • NameNode and Secondary NameNode
    • Meta-data storage and updates
    • Data storage and flows
  • Planning and creation:
    • Performance considerations
    • Loading and managing data files
    • Tuning and maintenance

MapReduce

  • History and theory of operation
  • Apache Hadoop implementation:
    • Jobtracker
    • Tasktrackers
    • DataNodes

Authentication and Authorization

  • Hadoop users
  • HDFS:
    • File ownership and permissions
    • Quotas
    • Kerberos

MapReduce schedulers

  • FIFO
  • Fair
  • Capacity

Cluster monitoring and maintenance

  • Adding and removing DataNodes
  • Monitoring and balancing HDFS storage
  • Jobtracker and Tasktracker status

Troubleshooting

  • Slow or long running jobs
  • Location and use of Hadoop Jobs and Log files
  • NameNode Recovery and Failure
  • Cluster re-balancing
  • Others

Appendix


H6C60S - A.00