Mathematics of Big Data and Machine Learning

Many arrays of numbers 0 and 1 in a dark blue background

"Big Data" refers to a technological phenomenon that has emerged since the mid-1980s. As computers have improved in capacity and speed, the greater storage and processing possibilities have also generated new challenges. New analytical tools, including the ones introduced in this course, have since been developed to solve these challenges in management of those phenomenally large data sets. (Image source: DARPA/public domain.)


Cite This Resource

Resource Description

Resource Features

Course Description

This course introduces the Dynamic Distributed Dimensional Data Model (D4M), a breakthrough in computer programming that combines graph theory, linear algebra, and databases to address problems associated with Big Data. Search, social media, ad placement, mapping, tracking, spam filtering, fraud detection, wireless communication, drug discovery, and bioinformatics all attempt to find items of interest in vast quantities of data. This course teaches a signal processing approach to these problems by combining linear algebraic graph algorithms, group theory, and database design. This approach has been implemented in software. The class will begin with a number of practical problems, introduce the appropriate theory, and then apply the theory to these problems. Students will apply these ideas in the final project of their choosing. The course will contain a number of smaller assignments which will prepare the students with appropriate software infrastructure for completing their final projects.

Other Versions

Other OCW Versions

Archived versions: Question_avt logo

Related Content

Jeremy Kepner, and Vijay Gadepally. RES.LL-005 Mathematics of Big Data and Machine Learning. January IAP 2020. Massachusetts Institute of Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA.

For more information about using these materials and the Creative Commons license, see our Terms of Use.