Courses of Study 2021-2022 
    
    Dec 18, 2024  
Courses of Study 2021-2022 [ARCHIVED CATALOG]

Add to Favorites (opens a new window)

CS 5304 - Data Science in the Wild

(crosslisted) INFO 5304  
     


Spring. 3 credits. Letter grades only.

Enrollment limited to: Cornell Tech students. Offered in New York City at Cornell Tech.

R. Nandakumar.

Massive amounts of data are collected by many companies and organizations and the task of a data scientist is to extract actionable knowledge from the data – for scientific needs, to improve public health, to promote businesses, for social studies and for various other purposes. This course will focus on the practical aspects of the field and will attempt to provide a comprehensive set of tools for extracting knowledge from data.

The course will cover the topics needed to solve data-science problems, which include problem formulation (business understanding), data preparation (collection, sampling, integration, cleaning), data modeling (characterization, model selection, and analysis), implementation (large-scale data processing, feedback loops, QA) and communication (data presentation, visualization). Advanced topics such as causal inference and processing streaming data will be presented.

Throughout the course, the students will perform a data-science mission with all the required steps, from problem formulation to result presentation.



Add to Favorites (opens a new window)