The course teaches data science principles and applications through lecture and hands-on labs. Students will learn how to select the right tool for the job and the strengths of each tool while gaining practical experience in creating working systems.

Audience: This course is intended for architects, software developers, analysts and data scientists who need to understand how to apply data science to large datasets with Hadoop.
Course Duration: 3 days
Prerequisites:

Students must have basic computer skills, basic knowledge in statistics and a basic understanding of programming or scripting. Prior experience with Hadoop, Mahout, R or Python is helpful but not required.

Course Objectives:
  • Understand the foundations of data science
  • Understand the principles of machine learning
  • Learn about Hadoop and its interaction with data science
  • Learn to program in R and to use it for statistical analysis
  • Analyze texts with Python NLTK
  • Understand recommender systems
  • Compare implementing a recommender with R and Mahout
Course Outline:
  • Lab Content
    • Set Up Development Environment
    • Defining the Problem
    • Programming in R
    • Analyzing Data with R
    • Creating the User/Item Matrix
    • Recommender Lab with R
    • Recommender Lab with Mahout