Over the past few weeks I took time to refresh my experience with Apache Airflow after a two-year gap of not actively using it in production. I attended Beyond Analytics by Astronomer and revisited Marc Lamberti’s Airflow Fundamentals course — earning the certification as a formal checkpoint of the refresh.

Coming back to Airflow was less about learning it from scratch and more about re-aligning with newer patterns, APIs, and operational practices introduced in the last couple of years.

Highlights from Beyond Analytics

Sessions focused on DAG design at scale, observability and metrics for workflows, safe deployment practices, and tighter integration patterns with modern data stacks. These are practical improvements I plan to apply to our DAGs and operational playbooks.

What the certification review refreshed

Having used Airflow in the past, this was a valuable opportunity to sharpen concepts — from DAGs, operators and sensors, to scheduling, retries and SLA handling. The certification review also refreshed best practices around structuring DAGs, using the TaskFlow API, and designing idempotent tasks that are testable and easy to operate.

Practical next steps I'll apply

  • Audit DAGs for idempotence and stable retry semantics.
  • Improve metrics and alerts (task duration, failures, and backfill health).
  • Standardize DAG templates and local testing patterns for quicker onboarding.

TaskFlow — quick example

A minimal TaskFlow snippet illustrating decorator-based tasks and automatic XCom handling.

from airflow.decorators import dag, task
from datetime import datetime

@dag(start_date=datetime(2025,9,1), schedule_interval='@daily', catchup=False)
def example_pipeline():
    @task()
    def extract():
        return {"rows": 10}

    @task()
    def transform(data):
        return data["rows"] * 2

    @task()
    def load(result):
        print("loaded", result)

    load(transform(extract()))

example_pipeline_dag = example_pipeline()
Published by Karthigai Selvan — Lead Data Engineer