Courses of Study 2018-2019 
    
    Apr 24, 2024  
Courses of Study 2018-2019 [ARCHIVED CATALOG]

Add to Favorites (opens a new window)

ORIE 4741 - [Learning with Big Messy Data]


     
Fall. Next Offered: 2019-2020. 4 credits. Letter grades only.

Prerequisite: linear algebra (MATH 2940  or equivalent), probability theory (ENGRD 2700  or equivalent), programming (ENGRD 2110 /CS 2110  or equivalent), and discrete math (CS 2800  or equivalent recommended).

Staff.

Modern data sets, whether collected by scientists, engineers, medical researchers, government, financial firms, social networks, or software companies, are often big, messy, and extremely useful. This course addresses scalable robust methods for learning from big messy data. We’ll cover techniques for learning with data that is messy — consisting of real numbers, integers, booleans, categoricals, ordinals, graphs, text, sets, and more, with missing entries and with outliers — and that is big — which means we can only use algorithms whose complexity scales linearly in the size of the data. We will cover techniques for cleaning data, supervised and unsupervised learning, finding similar items, model validation, and feature engineering. The course will culminate in a final project in which students extract useful information from a big messy data set.



Add to Favorites (opens a new window)