Advanced Hadoop for Developers

Course Number:


Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive and HBase. These advanced programming techniques will be beneficial to experienced Hadoop developers.


This class is designed specifically for developers.
Course Duration:


Students should be familiar with the Java programming language (most programming exercises are in Java) and comfortable in Linux environment (i.e., be able to navigate Linux command line, edit files using vi / nano). We recommend participating in the Hadoop for Developers course first or having equivalent knowledge and experience.

Course Objectives:
Course Outline:
  • Data Management in HDFS
    • Various Data Formats (JSON / Avro / Parquet)
    • Compression Schemes
    • Data Masking
    • Labs


  • Advanced Pig
    • User-defined Functions
    • Introduction to Pig Libraries (ElephantBird / Data-Fu)
    • Loading Complex Structured Data using Pig
    • Pig Tuning
    • Labs


  • Advanced Hive
    • User-defined Functions
    • Compressed Tables
    • Hive Performance Tuning
    • Labs


  • Advanced HBase
    • Advanced Schema Modelling
    • Compression
    • Bulk Data Ingest
    • Wide-table / Tall-table comparison
    • HBase and Pig
    • HBase and Hive
    • HBase Performance Tuning
    • Labs

Related Posts

About Us

IT Training, Agile Ways of Working and High Impact Talent Development Strategies

Let Us Come to You!

Classes recently delivered in: Atlanta, Boston, Chicago, Columbus, Dallas, Detroit, Indianapolis, Jerusalem, London, Milan, New York, Palo Alto, Phoenix, Pittsburgh, Portland, Raleigh, San Antonio, San Diego, San Francisco, San Jose, Seattle, Springfield, Mass., St. Louis, Tampa and more!