Chapter 1 Syllabus

1.1 In a nutshell

R is not only an open-source, freely available and widely used programming language and interactive environment for data analysis, but one of its main attractions is to give easy access to the most powerful methods and graphical techniques for understanding data. R benefits from a large community of passionate contributors, including prominent researchers, who implement the latest developments and newest results in statistics, data computing, and statistics for applications. This makes R one of the most popular software packages for data scientists, and its use is booming in academia, business, and machine learning challenges.

The aim of this course is to give students a solid foundation to master the amazing capacities of R for statistics and graphics. We begin by examining some fundamental principles and essential programming needed for all data analysis. Next, we look at the exciting field of data exploration and present classical statistical models and learning methods, including some recently introduced techniques. We will analyse and model data, with a particular importance given to visualization and interactive features of R, which are key to communicate results.

1.2 Outline

  1. Getting started with R
    Creating, displaying, types of objects
    Preparing data: reading / exporting data from file
    Data manipulation
    Programming: loops, predefined/creating functions
    Packages
    Automated statistical reports with R Markdown

  2. Data vizualisation - exploratory data analysis
    Summarizing data
    Conventional graphical functions/ customizing graphs
    Graphics with ggplot2
    Classical vizualisation methods (histograms, barplots, boxplots, )
    Advanced/ Interactive graphs (plotly, D3js)
    Spatial statistics (R googleMaps)

  3. Dimensionality reduction methods
    Principal component analysis
    Correspondence analysis - Multiple correspondence analysis

  4. Unsupervised clustering Ascending hierarchical clustering
    The k-means methods

  5. Missing values

  6. Extending …
    Performance/ Interfacing R
    Text mining/ Analysis of web data (web scraping, Rfacebook)
    Shiny

1.3 Practical information

Time: T 1:30-3:30 pm Amphi Monge

Lab Time: T 4:00-6:00 pm PC 16-17-18

Cours 1 26/09/2017, Cours 2 03/10/2017, Cours 3 17/10/2017, Cours 4 24/10/2017, Cours 5: 07/11/2017, Cours 6: 14/11/2017, Cours 7: 21/11/2017, Cours 8 28/11/2016, Cours 9 05/12/2017 –>

Lectures consist of brief presentation of statistical methods and their implementation in R. The students will practice during labs. At the end of this course, students will be able to:
- get, clean and preprocess data of different formats
- perform a statistical analysis, from descriptive statistics, to visualization, to model, to communication
- apply many unsupervised statistical methods
- program main statistical algorithms
- hopefully, develop a critical approach

Grades:
60% Homework (3 homeworks). A pdf and a .Rmd file should be submitted to ensure reproducible results. 40% Project/ Case study. The aim of is to answer a specific question using the appropriate tools and write a small (up to 4 pages) reproducible report (plus 2 pages of supplementary materials). It is also possible to focus on specific R tools. More details will be given in class. Each group will have 20 minutes to present his work.