Automating Workflows with Apache Airflow: A Practical Guide

Ready to explore the power of Apache Airflow? This step-by-step guide will walk you through setting up a weather data pipeline that fetches data from an API and stores it in PostgreSQL. Whether you're new to Airflow or looking to automate your workflows, this tutorial will help you build a fully functional pipeline in no time!

Apache Airflow is a powerful workflow orchestration tool that enables data engineers to automate data pipelines, schedule tasks, and monitor execution flows. In this blog post, we will explore Apache Airflow’s capabilities by building a weather data pipeline that:

  1. Fetches weather data from an API (e.g., weatherstack.com), and
  2. Stores the data in a PostgreSQL database for analysis.

We will use Docker Compose to set up Apache Airflow and PostgreSQL.


 

Why Use Apache Airflow?

 

Apache Airflow provides several advantages for workflow orchestration:

Dynamic Workflows – Define workflows as Python code using Directed Acyclic Graphs (DAGs).

Scalability – Run tasks in parallel using distributed execution.

Monitoring & UI – Track execution progress and logs via a web interface.

Extensibility – Integrate with APIs, databases, and cloud platforms using built-in and custom operators.

For this tutorial, we will set up a weather data pipeline with Airflow and PostgreSQL using Docker Compose.


 

Project Setup

 

1. Prerequisites

Before we start, make sure you have:

 

2. Directory Structure

A well-structured project is essential for maintainability. Here's how we'll organize our files:

airflow-weather-pipeline/
├── Makefile
│── .env.example            # example environment variables
├── README.md               # Project documentation
├── dags                    # DAG scripts
│   ├── weather_pipeline.py # The weather DAG code
│   └── weather_utils/      # Additional modules
├── docker-compose.yml      # Docker setup
└── scripts                 # Config, fetch, and store scripts
    ├── init.sh             # The script that runs on Airflow initialization
    ├── schema.sql          # Database schema
    └── schema_version.txt  # Current schema version

 

3. Setting Up Apache Airflow and PostgreSQL with Docker Compose

Create a docker-compose.yml file to define the services:

services:
  postgres:
    image: postgres:17
    container_name: airflow-postgres
    ...

  airflow-init:
    image: apache/airflow:2.10.5
    container_name: airflow-init
    ...

  airflow-scheduler:
    image: apache/airflow:2.10.5
    container_name: airflow-scheduler
    ...

  airflow-webserver:
    image: apache/airflow:2.10.5
    container_name: airflow-webserver
    ...

volumes:
  ...

networks:
  ...

This setup:

  • Runs PostgreSQL as Airflow’s metadata database.
  • Starts Airflow Webserver on http://localhost:8080/.
  • Creates an Airflow admin user automatically.
  • Enables the Airflow scheduler to manage DAG execution.

To start the services, run:

make up

 

4. Initializing the Database

Before we run our DAG, let's create a PostgreSQL table for storing weather data.

Create a script scripts/init.py:

#!/bin/bash
...
PGPASSWORD="$POSTGRES_PASSWORD" psql \
    -h "$POSTGRES_HOST" \
    -U "$POSTGRES_USER" \
    -d "$POSTGRES_DB" \
    -f /opt/airflow/scripts/schema.sql
...

Create a script scripts/schema.sql:

CREATE TABLE IF NOT EXISTS weather_data (
    id              SERIAL PRIMARY KEY,
    city            VARCHAR(50)             NOT NULL,
    temperature     FLOAT                   NOT NULL,
    humidity        INT                     NOT NULL,
    weather         VARCHAR(50)             NOT NULL,
    ...             ...                     ...
    timestamp       TIMESTAMP               DEFAULT CURRENT_TIMESTAMP
);

 

5. Creating the Weather Data Pipeline DAG

Now, let's create the DAG to fetch and store weather data.

Edit dags/weather_pipeline.py:

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
from weather_utils.fetch_weather import fetch_weather_data
from weather_utils.db_operations import store_weather_data


default_args = {
    "owner": "airflow",
    "depends_on_past": False,
    "start_date": datetime(2025, 3, 24),
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
}

dag = DAG(
    "weather_pipeline",
    default_args=default_args,
    schedule_interval="@hourly",
    catchup=False,
    description="Fetches weather data and stores it in PostgreSQL",
)

fetch_weather_task = PythonOperator(
    task_id="fetch_weather_data",
    python_callable=fetch_weather_data,
    dag=dag,
)

store_weather_task = PythonOperator(
    task_id="store_weather_data",
    python_callable=store_weather_data,
    dag=dag,
)

fetch_weather_task >> store_weather_task

 

6. Running the Pipeline

  1. Restart Airflow services to load the new DAG:
make up
  1. Log in to the Airflow UI at http://localhost:8080/.
  2. Enable the DAG weather_pipeline.
  3. Trigger a manual run and check PostgreSQL for new records:
docker exec -it airflow-postgres psql -U airflow -d airflow -c "SELECT * FROM weather_data;"

 


🎯 You just built a fully functional Apache Airflow pipeline! 🚀

✅ Fetches real-time weather data.

✅ Stores it in PostgreSQL.

✅ Runs automatically on an hourly schedule.

From here, you can extend the pipeline by:

  • Adding trend analysis to detect weather anomalies.
  • Integrating sms / email or other alerts for extreme conditions.
  • Expanding to multiple cities for comparative analysis.

Category: DIY Tips