# Probability and Statistics

 COURSE DESCRIPTION This course is an introduction to statistical thinking and concepts, beginning with basic probability theory. The course concludes with selected statistical methods useful for data exploration and description of vector-valued data, a common setup in modern data analysis applications. Python and/or R will be used for practical implementation of all numerical and graphical procedures, including simulations. Prerequisites Common requirements for the Semester in Mathematical Tools for Data Science.
 COURSE GOALS On completion of the course, students will: learn about basic statistical concepts and methods, including uncertainty and the role of probabilistic reasoning in data analysis; master presentation and use of mathematical concepts in probability theory; learn about selected methods for addressing statistical problems, such as multiple linear regression and logistic regression for issues in inferring about data structure, prediction, and classification; implement methods and graphical procedures via Python, using meaningful datasets.
 COURSE CONTENTS Introduction (0.5 week) Statistical thinking, role of data, stochasticity, and uncertainty. Probability Theory (2 weeks) Sample space and events. Basic properties of probability. Probability laws. Conditional probability and independence. Bayes Theorem. Random Variables (2.5 weeks)                                                                                    Mean and variance. Discrete families: Bernoulli, binomial, geometric and Poisson densities. Continuous families: exponential and normal densities. Multivariate normal distribution. Graphical methods for exploring univariate and multivariate data (1.5 weeks) Graphical tools for multivariate descriptions (matrix plots, parallel plots, icon plots, etc.). Statistical Inference (4.5 weeks) Likelihood. Asymptotic normality of maximum likelihood estimators. Bootstrap. Bayesian inference. Elements of Bayesian inference via MCMC (Markov Chain Monte Carlo). Regression Models (3 weeks) Linear regression and logistic regression. Prediction and classification. Bibliography Baron, Michael (2014). Probability and Statistics for Computer Scientists, 2nd Edition, CRC Press. DeGroot, Morris H.; Schervish, Mark J. (2012). Probability and Statistics, Addison-Wesley. [Main textbook] Wasserman, Larry (2004). All of Statistics: A Concise Course on Statistical Inference, Springer. Cook, Dianne; Swayne, Deborah F. (2007). Interactive and Dynamical Graphics for Data Analysis: With R and GGobi, Springer. Support Sessions 2 hours a week with a teaching assistant Grading Two midterm exams (25% each), homework (20%) and a final project (30%) Support Sessions 2 hours a week with a teaching assistant Grading Two midterm exams (25% each), homework (20%) and a final project (30%)