Statistical Learning

Introduction

This github repository is intended to provide materials (slides, scripts datasets etc) for the 2nd part of the Statistical Learning course at the UPC-UB MSc in Statistics and Operations Research (MESIO).

A previous version of this repository can be found at https://github.com/ASPteaching/introstatlearning, but I have decided to re-create it because the repository had become to heavy due to caches and git savings that I don’t know how to clean.

The (second part of the course) has two blocks, each with two parts.

Tree based methods

1.1 Decision trees

1.2 Ensemble methods
Artificial neural networks

2.1 Artificial neural networks

2.2 Introduction to deep learning

Class material

All class materials are available from the repository https://aspteaching.github.io/Introduction2StatisticalLearning/.

In this page you will find links to the html version of the slides and other documents, as well as to datasets or references and resources documents

Course presentation

Decision Trees

Decision trees are a type of non-parametric classifiers which have been Very successful because of their interpretability, flexibility and a very decent accuracy.

Slides
Notes
R-lab
Python-labs
- Introduction to python (from ISL. Ch 02)
- Decision Trees lab (from ISL. Ch 08)

Ensemble methods

The term “Ensemble” (together in french) refers to distinct approaches to build predictiors by combining multiple models.

They have proved to addres well some limitations of trees therefore improving accuracy and robustness as well as being able to reduce overfitting and capture complex relationships.

Artifical Neural Networks

Thesea are raditional ML models, inspired in brain, that simulate neuron behavior, thata is they receive an input, which is processed and an output prediction is produced.

For long their applicability has been relatively restricted to a few fields or problems due mainly to their “black box” functioning that made them hard to interpret.

The scenario has now completely changed with the advent of deep neural networks which are in the basis of many powerful applications of artificial intelligence.

Deep learning

Esssentially these are ANN with multiple hidden layers with allow overpassing many of their limitations. They can be tuned in a much more automatical way and have been applied to many complex tasks. such as Computer vision, Natural Language Processing or Recommender systems.

References and resources

References for Tree based methods

Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. CRC press.
Brandon M. Greenwell (202) Tree-Based Methods for Statistical Learning in R. 1st Edition. Chapman and Hall/CRC DOI: https://doi.org/10.1201/9781003089032 Web site
Efron, B., Hastie T. (2016) Computer Age Statistical Inference. Cambridge University Press. Web site
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.

References for deep neural networks

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (Vol. 1). MIT press. Web site
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Chollet, F. (2018). Deep learning with Python. Manning Publications.
Chollet, F. (2023). Deep learning with R . 2nd edition. Manning Publications.

Some interesting online resources

-Decision Trees free course (9 videos). By Analytics Vidhya