Skip to content

June 2025

Highlight of the Month

Summarize my biggest breakthrough, project, or insight in this month:

In this month, I mainly focused on my side projects - Retail Lakehouse with Debezium, Kafka, Iceberg, and Trino and A Unified SQL-based Data Pipeline. Not only implementing these projects, I also watched and read some videos and articles about what are the best practices of deploying, managing and maintaining Trino, Iceberg and Kafka Cluster.

What I Created or Tried

What I built, experimented with, or implemented:

What I Learned

Short reflections on what I actually learned or became more confident in:

  • Focus on OUTPUT, not INPUT. Nowadays, it's information overload everywhere, and it's easy to get lost in the sea of content. Instead of just consuming more information, I should focus on creating something meaningful with the knowledge I gain (Talk is cheap, show me the code).

Reflections – Beyond Just Tech

Soft-skill insights or workflow/communication/process reflections:

  • The ability to have a small talk with the speakers in the social gathering is a great way to build connections and learn from their experiences. In the past, I over focused on the technical aspects, but now I realize that soft skills and networking are equally important in the tech industry.
  • My dad decided to give up active cancer treatment this month and to focus on quality of life instead. After this month of reflection, I realized that there are still many things I want to do and achieve in my life, not only in my career but also in my personal life. Life is too short to be only focused on work.

What I Consumed

A list of articles, papers, courses, or videos I read/watched/completed:

Read

  • 如何建立獨一無二的 GitHub Profile!與三個很酷的設計及應用 🚀
  • Databricks買下Tabular,企圖改善資料相容性
  • Databricks將以10億美元買下開源雲端資料庫新創Neon
  • What Is a Lakebase?
    • Openness
    • Separation of storage and compute (the most important feature imo)
    • Serverless
    • Modern development workflow
    • Built for AI agents
    • Lakehouse integration
  • What Is a Lakehouse? | Databricks Blog
  • 愛好 AI Engineer 電子報 🚀 模型上下文協定 MCP 應用開發 #27
    • I really liked how the author described two different ways of building agents: one that relies on a customizable framework, and another that's more lightweight and built using just the core features of the programming language. It instantly reminded me of the old debates between TensorFlow 1.0 and PyTorch.
    • After reading this article, I realized that the strength of senior engineers lies in their ability to quickly pick up new technologies and analyze different approaches logically with their own keen insights. This is a skill that I aspire to develop.
  • How Agoda manages 1.8 trillion Events per day on Kafka
    • 2-step logging approach.
    • Multiple smaller Kafka clusters instead of 1 Large Kafka cluster per Data Center
    • Agoda employs a robust Kafka auditing system by aggregating message counts via background threads in client libraries, routing audits to a dedicated Kafka cluster, and implementing monitoring and alerting mechanisms for audit messages.
    • Agoda calculates cluster capacity by comparing each resource’s usage against its upper limit and taking the highest percentage to represent the dominant constraint at that moment.
    • Agoda attributes cost back to teams, which transformed team mindsets, driving proactive cost management and accountability across Agoda
    • The new auth system empowers the Kafka team to control access, manage credentials, and protect sensitive data through fine-grained ACLs
    • Operational scalability is ensured through automated tooling that streamlines and simplifies system management.
  • Scaling Kafka to Support PayPal’s Data Growth
    • Cluster Management: Kafka Config Service, ACLs, PayPal Kafka Libraries, QA Environment
    • Monitoring and Alerting
    • Configuration Management
    • Enhancements and Automation: Patching security vulnerabilities, Security Enhancements, Topic Onboarding, MirrorMaker Onboarding, Repartition Assignment Enhancements,
  • Pyright 上手指南:Python 型別檢查的新選擇
  • DuckLake: SQL as a Lakehouse Format
    • It simplifies lakehouses by using a standard SQL database for all metadata while still storing data in open formats like Parquet, just like BigQuery with Spanner and Snowflake with FoundationDB.

Watched

Completed Courses

Goals for Next Month

Set 2–3 simple goals to stay focused and accountable:

  • Started to submit my resume to some companies, so I need to prepare for interviews.
  • Publish a series of blog posts about the best practices of deploying, managing and maintaining Trino Cluster and Kafka Cluster and how the big companies use them in production environment.
  • Focus on verbal output, not just written output.
  • Host a series of mock interviews with people in the same community to practice my soft skills and networking abilities.