The course teaches data science principles and applications through lecture and hands-on labs. Students will learn how to select the right tool for the job and the strengths of each tool while gaining practical experience in creating working systems.
Audience: This course is intended for architects, software developers, analysts and data scientists who need to understand how to apply data science to large datasets with Hadoop.
Course Duration: 3 days
Prerequisites:
Students must have basic computer skills, basic knowledge in statistics and a basic understanding of programming or scripting. Prior experience with Hadoop, Mahout, R or Python is helpful but not required.
Course Objectives:
- Understand the foundations of data science
- Understand the principles of machine learning
- Learn about Hadoop and its interaction with data science
- Learn to program in R and to use it for statistical analysis
- Analyze texts with Python NLTK
- Understand recommender systems
- Compare implementing a recommender with R and Mahout
- Understand the foundations of data science
- Understand the principles of machine learning
- Learn about Hadoop and its interaction with data science
- Learn to program in R and to use it for statistical analysis
- Analyze texts with Python NLTK
- Understand recommender systems
- Compare implementing a recommender with R and Mahout
Course Outline:
- Lab Content
- Set Up Development Environment
- Defining the Problem
- Programming in R
- Analyzing Data with R
- Creating the User/Item Matrix
- Recommender Lab with R
- Recommender Lab with Mahout
- Lab Content
- Set Up Development Environment
- Defining the Problem
- Programming in R
- Analyzing Data with R
- Creating the User/Item Matrix
- Recommender Lab with R
- Recommender Lab with Mahout