Chapter 1 Syllabus

1.1 In a nutshell

R is not only an open-source, freely available and widely used programming language and interactive environment for data analysis, but one of its main attractions is to give easy access to the most powerful methods and graphical techniques for understanding data. R benefits from a large community of passionate contributors, including prominent researchers, who implement the latest developments and newest results in statistics, data computing, and statistics for applications. This makes R one of the most popular software packages for data scientists, and its use is booming in academia, business, and machine learning challenges.

The aim of this course is to give students a solid foundation to master the amazing capacities of R for statistics and graphics. We begin by examining some fundamental principles and essential programming needed for all data analysis. Next, we look at the exciting field of data exploration and present classical statistical models and learning methods, including some recently introduced techniques. We will analyse and model data, with a particular importance given to visualization and interactive features of R, which are key to communicate results.

1.2 Outline

Getting started with R
Creating, displaying, types of objects
Preparing data: reading / exporting data from file
Data manipulation
Programming: loops, predefined/creating functions
Packages
Automated statistical reports with R Markdown
Data vizualisation - exploratory data analysis
Summarizing data
Conventional graphical functions/ customizing graphs
Graphics with ggplot2
Classical vizualisation methods (histograms, barplots, boxplots, )
Advanced/ Interactive graphs (plotly, D3js)
Spatial statistics (R googleMaps)
Dimensionality reduction methods
Principal component analysis
Correspondence analysis - Multiple correspondence analysis
Unsupervised clustering Ascending hierarchical clustering
The k-means methods
Missing values
Extending …
Performance/ Interfacing R
Text mining/ Analysis of web data (web scraping, Rfacebook)
Shiny

1.3 Practical information

Time: T 1:30-3:30 pm Amphi Monge

Lab Time: T 4:00-6:00 pm PC 16-17-18

Cours 1 26/09/2017, Cours 2 03/10/2017, Cours 3 17/10/2017, Cours 4 24/10/2017, Cours 5: 07/11/2017, Cours 6: 14/11/2017, Cours 7: 21/11/2017, Cours 8 28/11/2016, Cours 9 05/12/2017 –>

Lectures consist of brief presentation of statistical methods and their implementation in R. The students will practice during labs. At the end of this course, students will be able to:
- get, clean and preprocess data of different formats
- perform a statistical analysis, from descriptive statistics, to visualization, to model, to communication
- apply many unsupervised statistical methods
- program main statistical algorithms
- hopefully, develop a critical approach

Grades:
60% Homework (3 homeworks). A pdf and a .Rmd file should be submitted to ensure reproducible results. 40% Project/ Case study. The aim of is to answer a specific question using the appropriate tools and write a small (up to 4 pages) reproducible report (plus 2 pages of supplementary materials). It is also possible to focus on specific R tools. More details will be given in class. Each group will have 20 minutes to present his work.

R for Statistics

R for Statistics

Julie Josse

2017-11-07

Chapter 1 Syllabus

1.1 In a nutshell

1.2 Outline

1.3 Practical information