COURSE DESCRIPTION |
An introduction to statistical thinking and concepts is developed, beginning with the mathematical (probabilistic) description of random variables. Care will be placed on understanding why and when certain probabilistic models may be used in applications, and how these are elicited from context and data analysis. The course concludes with selected statistical methods useful for data exploration and description of vector-valued data, a common setup in modern data analysis appli- Prerequisites Common requirements for the Semester in Mathematical Tools for Data Science (Spring). |
COURSE GOALS |
On completion of the course, students will:
|
COURSE CONTENTS |
1. Introduction (0.5 week) 1.1 Statistical thinking, role of data, stochasticity, and uncertainty. 2. Random Variables and random vectors (3 weeks) 2.1. Discrete and continuous univariate and multivariate densities. 3. Notable probability models (2 weeks) 3.1. Discrete families: Bernoulli, binomial, geometric and Poisson densities. 4. Graphical methods for exploring univariate and multivariate data (1.5 weeks) 4.1. Graphical tools for multivariate descriptions (matrix plots, parallel plots, icon plots, etc.). 5. Statistical Inference (4.5 weeks) 5.1. Parametric estimation via likelihood methods. 6. Regression Models (3 weeks) 6.1. Linear regression and logistic regression. Grading Course evaluation consists of homework assignments (20%) submitted via the Moodle site, two term exams (25% each) and one final exam (30%). Homework rate will be approximately one every 1–2 weeks. Support Sessions 1.5 hours a week with a teaching assistant References Baron, M. (2014). Probability and statistics for computer scientists (2nd ed ed.). CRC Press. |