In this course we dive into cloud technologies that allow organizations to tap into potentially thousands of computers at the click of a button at little upfront cost. We also explain the software that is used to do this and also to program such compute clusters, in order to use them for addressing Big Data problems.
Every week there is a 2hour Lecture with a matching practicum that is performed in the cloud on Amazon Web Services. The practicum is started in lab sessions (2hour per week), where you work on your own laptop and can ask questions. The solutions need to be submitted via blackboard. The way to contact the TAs is to post a question the Canvas dicussion forum.
During the course the students form teams with people from a variety of backgrounds towards solving a (big) data science problem at all its layers (i.e., from raw data to final visualisation of the results). For developing this project, which should result in a presentation and short report there are weekly meetings with Ana Varbanescu.
There is a weekly seminar (1 hour per week) where first there will be invited speakers and later student presentations on the student projects.
There is a theoretical exam at the end of the course. This exam counts for 25% of the final grade and must be passed. The average score on the lab session excercises also counts 25%. The student project counts 50%.
The below books give background information on the hardware, resp. software aspects of Big Data Infrastructures and Technologies:
The Lecture Pages (in the right menu) further provide a short summary of the main points of the lecture. Always read this summary! Further, these pages also provide access to some lighter extra material, and recommended related presentations (youtube,slideshare) on the web.
Some course overview information is available in the course outline (mostly in Dutch).
The origins of this material are in the Large Scale Data Engineering MSc course (LSDE).
The Big Data course in the Data Science master at Uva and VU was developed by Peter Boncz and Hannes Mühleisen from the Database Architectures research group of CWI, specifically for the Amsterdam Data Science initiative.
The lecture slides for this course are adapted from those used in the Extreme Computing course, which were graciously provided by dr. Stratis Viglas, of University of Edinburgh.