How Flink Works¶

Architecture Components¶
JobManager¶
coordinating the distributed execution of Flink Applications
- ResourceManager
- Dispatcher
- JobMaster
TaskManager¶
- Task Slot
-
For distributed execution, Flink chains operator subtasks together into tasks. Chaining operators together into
-
Each task is executed by one thread
Tasks and Operators chains¶
- Flink chains operator subtasks together into tasks.
- Chaining operators together into tasks is a useful optimization: it reduces the overhead of thread-to-thread handover and buffering, and increases overall throughput while decreasing latency.
Task Slots and Resources¶
- Note that no CPU isolation happens here; currently slots only separate the managed memory of tasks.
- By adjusting the number of task slots, users can define how subtasks are isolated from each other.
- Having one slot per TaskManager means that each task group runs in a separate JVM (which can be started in a separate container, for example).
- Having multiple slots means more subtasks share the same JVM. Tasks in the same JVM share TCP connections (via multiplexing) and heartbeat messages. They may also share data sets and data structures, thus reducing the per-task overhead.
- By default, Flink allows subtasks to share slots even if they are subtasks of different tasks, so long as they are from the same job.
- 2 benefits
- A Flink cluster needs exactly as many task slots as the highest parallelism used in the job.
- It is easier to get better resource utilization.
Streaming Processing¶


Flink Application Execution¶
- Flink Application Cluster
- Flink Session Cluster
APIs¶
From high to low level:
- SQL: High-level Language
- Table API: Declarative DSL
- DataStream API: Core API
- Stateful and timely stream processing: building blocks, 透過DataStream API的Process Function來使用