Big Data Infrastructures and Technologies

Home - BADS: Business Analytics/Data Science -

BADS Module: Big Data Infrastructures & Technologies

Lecture: Data Streams

PDF slides: Data Streams

[an error occurred while processing this directive]

Practicum

No practicum for this topic is provided.

Technical Literature

For technical background material, there are three papers,

Twitter Heron: Stream Processing at Scale (and why it is replacing the Storm system)
Discretized streams: Fault-tolerant streaming com-putation at scale (Spark Streaming)
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing (Google Cloud DataFlow)
Approximate Frequency Counts over Data Streams (Google Cloud DataFlow)

Related Presentations

Blogpost of Google DataFlow developers comparing it with Spark (Streaming)
note: Google DataFlow is being open sourced as Apache Beam.

Nathan Marz introducing his Lambda Architecture:

Slidedeck on Spark Streaming:

Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17 from spark-project

The introduction of Google DataFlow:

Finally, something on Storm; although it is mostly superseded by now:

Hadoop Summit Europe 2014: Apache Storm Architecture from P. Taylor Goetz

Extra Material

Peter Boncz · Hannes Mühleisen