# R for Statistics

*Julie Josse*

*2017-11-07*

# Chapter 1 Syllabus

## 1.1 In a nutshell

R is not only an open-source, freely available and widely used programming language and interactive environment for data analysis, but one of its main attractions is to give easy access to the most powerful methods and graphical techniques for understanding data. R benefits from a large community of passionate contributors, including prominent researchers, who implement the latest developments and newest results in statistics, data computing, and statistics for applications. This makes R one of the most popular software packages for data scientists, and its use is booming in academia, business, and machine learning challenges.

The aim of this course is to give students a solid foundation to master the amazing capacities of R for statistics and graphics. We begin by examining some fundamental principles and essential programming needed for all data analysis. Next, we look at the exciting field of data exploration and present classical statistical models and learning methods, including some recently introduced techniques. We will analyse and model data, with a particular importance given to visualization and interactive features of R, which are key to communicate results.

## 1.2 Outline

Getting started with R

Creating, displaying, types of objects

Preparing data: reading / exporting data from file

Data manipulation

Programming: loops, predefined/creating functions

Packages

Automated statistical reports with R MarkdownData vizualisation - exploratory data analysis

Summarizing data

Conventional graphical functions/ customizing graphs

Graphics with ggplot2

Classical vizualisation methods (histograms, barplots, boxplots, )

Advanced/ Interactive graphs (plotly, D3js)

Spatial statistics (R googleMaps)Dimensionality reduction methods

Principal component analysis

Correspondence analysis - Multiple correspondence analysisUnsupervised clustering Ascending hierarchical clustering

The k-means methodsMissing values

Extending …

Performance/ Interfacing R

Text mining/ Analysis of web data (web scraping, Rfacebook)

Shiny

## 1.3 Practical information

Time: T 1:30-3:30 pm Amphi Monge

Lab Time: T 4:00-6:00 pm PC 16-17-18

Cours 1 26/09/2017, Cours 2 03/10/2017, Cours 3 17/10/2017, Cours 4 24/10/2017, Cours 5: 07/11/2017, Cours 6: 14/11/2017, Cours 7: 21/11/2017, Cours 8 28/11/2016, Cours 9 05/12/2017 –>

Lectures consist of brief presentation of statistical methods and their implementation in R. The students will practice during labs. At the end of this course, students will be able to:

- get, clean and preprocess data of different formats

- perform a statistical analysis, from descriptive statistics, to visualization, to model, to communication

- apply many unsupervised statistical methods

- program main statistical algorithms

- hopefully, develop a critical approach

**Grades**:

60% Homework (3 homeworks). A pdf and a .Rmd file should be submitted to ensure reproducible results. 40% Project/ Case study. The aim of is to answer a specific question using the appropriate tools and write a small (up to 4 pages) reproducible report (plus 2 pages of supplementary materials). It is also possible to focus on specific R tools. More details will be given in class. Each group will have 20 minutes to present his work.