|
|
May 16, 2025
|
|
ORIE 3741 - Learning with Big Messy Data Fall, Spring. 4 credits. Letter grades only (no audit).
Forbidden Overlap: due to an overlap in content, students will not receive credit for both CS 3780 (and the former) CS 4780/5780, ECE 3200 (and the former ECE 4200), ORIE 3741 (and the former ORIE 4741/5741), STSCI 3740 (and the former STSCI 4740/5740). Prerequisite: MATH 2940 , ENGRD 2700 , ENGRD 2110 /CS 2110 , CS 2800 or equivalents. Co-meets with ORIE 5741 .
Staff.
Modern data sets, whether collected by scientists, engineers, medical researchers, government, financial firms, social networks, or software companies, are often big, messy, and extremely useful. This course addresses scalable robust methods for learning from big messy data. We’ll cover techniques for learning with data that is messy - consisting of real numbers, integers, booleans, categoricals, ordinals, graphs, text, sets, and more, with missing entries and with outliers - and that is big - which means we can only use algorithms whose complexity scales linearly in the size of the data. We will cover techniques for cleaning data, supervised and unsupervised learning, finding similar items, model validation, and feature engineering.
Add to Favorites (opens a new window)
|
|
|