August 2025¶

Highlight of the Month¶

Summarize my biggest breakthrough, project, or insight in this month:

This month I mainly focused on 2 things. First, I dove deep into Apache Iceberg performance tuning techniques, learning how to optimize both write and read operations, manage table compactions, and implement effective partitioning strategies. This knowledge is crucial for maintaining efficient data lakes and ensuring high query performance. Second, I dedicated time to preparing for technical interviews, specific for data lakehouse engineer role.

What I Built, Published, or Experimented with¶

Published Best Practices for Optimizing Apache Iceberg Workloads in AWS
Published Deep Dive into Kafka Connect Icerberg Sink Connector
Published Exactly Once Semantics in Kafka
Published What's New in Apache Airflow 3
Published 5 Practical Ways to Speed Up Your Apache Spark Queries
Experimented with Pyrefly
Experimented with colima for replacing Docker Desktop on Mac (and it was great!)
Experimented with Claude Code.

What I Learned¶

Short reflections on what I actually learned or became more confident in:

Learned the performance tuning techniques for maintaining Iceberg tables, including write optimization, read optimization, compaction, partitioning strategies, etc.

Reflections – Beyond Just Tech¶

Soft-skill insights or workflow/communication/process reflections:

After interviewed several times for data engineering roles and finally got a job offer, I realized that

Architecture diagram is super important for communicating my design ideas effectively and it catch interviewers' attention right away
Preparing some materials based on the interviewer's introduction to the team and company after the first round interview is very helpful for the follow-up interviews, as it shows my enthusiasm and interest in the role and company.
Side projects are definitely a plus, as it demonstrates my passion and commitment to learning and growing in the field.

What I Consumed¶

A list of articles, papers, courses, or videos I read/watched/completed:

Performance Tuning & Optimization¶

Real-time Data Processing (RisingWave, Debezium, Flink CDC)¶

RisingWave vs. Apache Flink: Which one to choose?

SQLMesh¶

Airflow¶

Iceberg & Hive¶

Goals for Next Month¶

Set 2–3 simple goals to stay focused and accountable:

Explore observability and SRE practices, hands-on with OpenTelemetry, Prometheus, Grafana
Get more familiar with Trino in terms of performane tuning