Skip to content

How Flink Works

Source: Flink Doc

Architecture Components

Source: Flink Doc
Anatomy of a Flink Cluster

JobManager

coordinating the distributed execution of Flink Applications

  • ResourceManager
  • Dispatcher
  • JobMaster

TaskManager

  • Task Slot
  • For distributed execution, Flink chains operator subtasks together into tasks. Chaining operators together into

  • Each task is executed by one thread

Tasks and Operators chains

  • Flink chains operator subtasks together into tasks.
  • Chaining operators together into tasks is a useful optimization: it reduces the overhead of thread-to-thread handover and buffering, and increases overall throughput while decreasing latency.
Source: Flink Doc

Task Slots and Resources

  • Note that no CPU isolation happens here; currently slots only separate the managed memory of tasks.
  • By adjusting the number of task slots, users can define how subtasks are isolated from each other.
  • Having one slot per TaskManager means that each task group runs in a separate JVM (which can be started in a separate container, for example).
  • Having multiple slots means more subtasks share the same JVM. Tasks in the same JVM share TCP connections (via multiplexing) and heartbeat messages. They may also share data sets and data structures, thus reducing the per-task overhead.
Source: Flink Doc
  • By default, Flink allows subtasks to share slots even if they are subtasks of different tasks, so long as they are from the same job.
  • 2 benefits
    • A Flink cluster needs exactly as many task slots as the highest parallelism used in the job.
    • It is easier to get better resource utilization.
Source: Flink Doc

Streaming Processing

Streaming Processing Source: Flink Doc
Program to Dataflow Source: Flink Doc
Source: Flink Docs
Source: Flink Docs
Source: Flink Docs
  • Flink Application Cluster
  • Flink Session Cluster

APIs

Flink APIs

From high to low level:

  • SQL: High-level Language
  • Table API: Declarative DSL
  • DataStream API: Core API
  • Stateful and timely stream processing: building blocks, 透過DataStream API的Process Function來使用