Data Science with Hadoop for Statistics and Text Analysis

Course Number:


The course teaches data science principles and applications through lecture and hands-on labs. Students will learn how to select the right tool for the job and the strengths of each tool while gaining practical experience in creating working systems.


This course is intended for architects, software developers, analysts and data scientists who need to understand how to apply data science to large datasets with Hadoop.
Course Duration:
3 days


Students must have basic computer skills, basic knowledge in statistics and a basic understanding of programming or scripting. Prior experience with Hadoop, Mahout, R or Python is helpful but not required.

Course Objectives:
  • Understand the foundations of data science
  • Understand the principles of machine learning
  • Learn about Hadoop and its interaction with data science
  • Learn to program in R and to use it for statistical analysis
  • Analyze texts with Python NLTK
  • Understand recommender systems
  • Compare implementing a recommender with R and Mahout
Course Outline:
  • Lab Content
    • Set Up Development Environment
    • Defining the Problem
    • Programming in R
    • Analyzing Data with R
    • Creating the User/Item Matrix
    • Recommender Lab with R
    • Recommender Lab with Mahout

Related Posts

About Us

IT Training, Agile Ways of Working and High Impact Talent Development Strategies

Let Us Come to You!

Classes recently delivered in: Atlanta, Boston, Chicago, Columbus, Dallas, Detroit, Indianapolis, Jerusalem, London, Milan, New York, Palo Alto, Phoenix, Pittsburgh, Portland, Raleigh, San Antonio, San Diego, San Francisco, San Jose, Seattle, Springfield, Mass., St. Louis, Tampa and more!