R is a popular open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students and covers language fundamentals, libraries and advanced concepts. Advanced data analytics and graphing with real world data are also included.
Audience: This course is designed for developers and data analysts.
Course Duration: 3 days
Prerequisites:
A basic programming background is preferred.
Hardware and Software Requirements:
Participants will need a modern laptop with the latest R studio and R environment installed.
Course Outline:
- Language Basics (one day)
- Course Introduction
- About Data Science
- Data Science Definition
- Process of Doing Data Science
- Introducing R Language
- Variables and Types
- Control Structures (Loops / Conditionals)
- R Scalars, Vectors and Matrices
- Defining R Vectors
- Matrices
- String and Text Manipulation
- Character Data Type
- File IO
- Lists
- Functions
- Introducing Functions
- Closures
- lapply/sapply Functions
- DataFrames
- Labs
- Intermediate R Programming (one day)
- DataFrames and File I/O
- Reading Data from Files
- Data Preparation
- Built-In Datasets
- Visualization
- Graphics Package
- plot() / barplot() / hist() / boxplot() / scatter plot
- Heat Map
- ggplot2 package ( qplot(), ggplot())
- Exploration with Dplyr
- Labs
- Advanced Programming with R (one day)
- Statistical Modeling with R
- Statistical Functions
- Dealing with NA
- Distributions (Binomial, Poisson, Normal)
- Regression
- Introducing Linear Regressions
- Recommendations
- Text Processing (tm package / wordcloud)
- Clustering
- Introduction to Clustering
- KMeans
- Classification
- Introduction to Classification
- Naive Bayes
- Decision Trees
- Training using Caret Package
- Evaluating Algorithms
- R and Big Data
- Hadoop
- Big Data Ecosystem
- RHadoop
- Labs
- Statistical Modeling with R