Data Science with Hadoop for Statistics and Text Analysis

Course Number:

N/A

The course teaches data science principles and applications through lecture and hands-on labs. Students will learn how to select the right tool for the job and the strengths of each tool while gaining practical experience in creating working systems.

Audience:

This course is intended for architects, software developers, analysts and data scientists who need to understand how to apply data science to large datasets with Hadoop.
Course Duration:
3 days

Prerequisites:

Students must have basic computer skills, basic knowledge in statistics and a basic understanding of programming or scripting. Prior experience with Hadoop, Mahout, R or Python is helpful but not required.

Course Objectives:
  • Understand the foundations of data science
  • Understand the principles of machine learning
  • Learn about Hadoop and its interaction with data science
  • Learn to program in R and to use it for statistical analysis
  • Analyze texts with Python NLTK
  • Understand recommender systems
  • Compare implementing a recommender with R and Mahout
Course Outline:
  • Lab Content
    • Set Up Development Environment
    • Defining the Problem
    • Programming in R
    • Analyzing Data with R
    • Creating the User/Item Matrix
    • Recommender Lab with R
    • Recommender Lab with Mahout

Related Posts

About Us

IT Training, Agile Ways of Working and High Impact Talent Development Strategies

Let Us Come to You!

Classes recently delivered in: Atlanta, Boston, Chicago, Columbus, Dallas, Detroit, Indianapolis, Jerusalem, London, Milan, New York, Palo Alto, Phoenix, Pittsburgh, Portland, Raleigh, San Antonio, San Diego, San Francisco, San Jose, Seattle, Springfield, Mass., St. Louis, Tampa and more!