COURSE DESCRIPTION

Course Description

In this course, we will review the fundamentals in the design and rigorous analysis of algorithms oriented toward the solution of data science problems. The first part of the course is devoted to reviewing elementary data structures and algorithm paradigms, with the focus then turning to algorithms on graphs, matching problems, searching problems, applications to data mining and on computational geometry.

 

Prerequisites

Common requirements for the Semester in Mathematical Tools for Data Science and some knowledge of discrete mathematics.

COURSE GOALS

On completion of the course, students will be able to:

  • employ the fundamental concepts necessary for analyzing algorithms in terms of complexity;
  • be able to read and reproduce proofs of correctness of algorithms and understand the different methodologies to implement them;
  • master the essential data structures and main algorithms for dealing with graphs, recursive problems and computational geometry;
  • understand the algorithmic nature of several problems involved in data science;
  • be able to implement rigorously and methodically an algorithm in a high-level programming language.
COURSE CONTENT

Course Content 

  1. General tools for studying algorithms (3 weeks)
    1. Algorithm/problem complexity. Evaluation of algorithm complexity
    2. Elementary data structures (stacks, queues, binary trees, heaps, priority queues, disjoint sets)
  2. Algorithms on data sequences (3 weeks)
    1. Divide and conquer and dynamic programming, and applications to data science
    2. Sequence alignment problems
  3. Algorithms on graphs (5 weeks)
    1. Graphs in data science. Data structures for graphs, traversal algorithms and applications
    2. Algorithms on graphs: shortest paths; Bellman-Ford; Dijkstra; A*
    3. Algorithms on graphs: flow networks; applications to bipartite graph matching
    4. Applications to Data Science 
  4. Handling large datasets (3 weeks)
    1. Data mapping and dictionaries; hashing
    2. Similarity search
    3. Randomized and memory-free methods

Bibliography

  1. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009). Introduction to Algorithms, 3rd Edition, MIT Press. [Main textbook]
  2. Cormen, Thomas H. (2013).  Algorithms Unlocked, MIT Press.
  3. Kleinberg, Jon; Tardos, Eva (2005). Algorithm Design, Addison-Wesley.
  4. Leskovec, Jure; Rajaraman, Anand; Ullman, Jeff (2014). Mining Massive Datasets, Cambridge University Press.
  5. Hromkovic, Jurj (2005). Design and Analysis of Randomized Algorithms, Springer.

Support Sessions

2 hours a week with a teaching assistant.

Grading

Two midterm exams (10% each), homework assignments (40%), final exam (20%), integrative project (20%).