<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1718016518470150&amp;ev=PageView&amp;noscript=1">

Dive into data engineering.

Students will mine and model data from a variety of sources over the 12-week DS12 course, which culminates in a capstone project.

Learn More
Learn More

Dive into machine learning.

Students will mine and model data from a variety of sources over the 12-week DS12 course, which culminates in a capstone project.

PROGRAM TIMELINE

  Week 1-2 Week 3-4 Week 5-6 Week 7-8 Week 9-12
DATA MINING Machine learning models Frequent itemset mining Locality-sensitive hashing Clustering and recommender systems Capstone project
FUNCTIONAL PROGRAMMING Type classes and combinators Monoids and monads Applicative and transversable functors Stream processing and external effects Capstone project
TOOLS WORKSHOP SBT and AWS deployment SparkSQL Tuning and debugging Spark streaming Capstone project
LAB Logging Churn modeling Entity resolution Recommender Systems Capstone project
Week 1-2 Machine learning models Type classes and combinators SBT and AWS deployment Logging
Week 3-4 Frequent itemset mining Monoids and monads SparkSQL Churn modeling
Week 5-6 Locality-sensitive hashing Applicative and transversable functors Tuning and debugging Entity resolution
Week 7-8 Clustering and recommender systems Stream processing and external effects Spark streaming Recommender systems
Week 9-12 Capstone project Capstone project Capstone project Capstone project

COURSE DESCRIPTIONS

Models

We'll introduce you to the models used in modern machine learning applications, with an emphasis on implementation and extension. Class starts with a mathematical dicussion at a white board, and evolves into a live coding session. We’ll dive into the source code and APIs you’ll need to know to do your work.

Methods

This course focuses on algorithmic and computational methods for mining large datasets. After we introduce you to a new concept, we’ll place it into the context of a library and terascale data problem, using raw data sets from real DataScience clients.

Functional Programming

You will study the elements of modern functional programming and their application to scalable data manipulation using the Scala Collections library and MapReduce frameworks like Scalding and Spark. We’ll tackle purely functional data structures, combinators, monads, and more.

Lab

This is where everything you’re learning comes together. You’ll work with your classmates on real-world projects alongside our instructors and TAs. Midway through the program, you will begin to take on longer projects that require production coding skills.

Data Mining

This course focuses on algorithmic and computational methods for mining large datasets. We'll introduce you to the models used in modern machine learning applications, with an emphasis on implementation and extention.  After we introduce you to a new concept, we'll place it into the context of a library and terascale data problem while using raw data sets from real DataScience clients.

Functional Programming

You will study the elements of modern functional programming and their application to scalable data manipulation using the Scala Collections and Typesafe libraries. We’ll tackle parsing, property-based testing, thread-based parallelism, purely functional data structures, combinators, monoids, monads, applicative functors and more.

Tools Workshop

Learn to use modern JVM-based engineering tools such as Spark, Hadoop, Hive, Scalding, Algebird, Elasticsearch, and Neo4j, as well as engineering best practices such as unit testing and continuous deployment. This workshop is taught with a focus on distributed environments.

Lab

This is where everything you’re learning comes together. You’ll work with your classmates on real-world projects alongside our instructors and TAs. Midway through the program, you will begin to take on longer projects that require production-level coding skills.