← All posts
December 25, 2024·6 min read

Creating a DAG to Schedule a Report to Google Sheets via Airflow

A step-by-step guide

AirflowPythonGoogle SheetsRedshiftAstronomer

This tutorial walks through building an Airflow DAG that automates report generation by querying Amazon Redshift, combining data, and uploading results to Google Sheets — using Astronomer, Docker, and Airflow running locally.

What You'll Learn

  • Setting up Docker and Astronomer locally
  • Configuring Airflow to query Amazon Redshift
  • Setting up Google Cloud Console for Sheets API access
  • Writing and scheduling an Airflow DAG to automate report generation

Prerequisites

  • Docker installed
  • Astronomer CLI installed
  • Access to an Amazon Redshift cluster with read, write, and create table privileges
  • Google account with Google Sheets access
  • Basic Python and SQL familiarity

Step 1: Setting Up Amazon Redshift

Create a schema for staging tables, then prepare three SQL queries:

  • company_metrics_WTD.sql — Generates Week-to-Date metrics
  • company_metrics_MTD.sql — Generates Month-to-Date metrics
  • company_metrics_Union.sql — Unions WTD and MTD results into a single table

Test all queries in your Redshift cluster before proceeding.

Step 2: Setting Up Google Cloud Console

  1. Access Google Cloud Console
  2. Create or select a project
  3. Enable the Google Sheets API
  4. Generate a Service Account Key and save the JSON file locally
  5. Share your Google Sheet with the Service Account email

Step 3: Installing Docker and Astronomer Locally

Download Docker from Docker's website and verify your installation with:

docker --version

Install the Astronomer CLI via Homebrew:

brew install astro

Verify with astro version.

Step 4: Setting Up Airflow Locally

Initialize an Astronomer project, which scaffolds the essential directories and files:

astro dev init

This creates dags/, include/, plugins/, and requirements.txt.

Add the following to requirements.txt:

pandas
apache-airflow-providers-amazon
google-api-python-client
google-auth
apache-airflow-providers-google

Start Airflow (Docker must be running):

astro dev start

Access the Airflow UI at localhost:8080.

Then create two connections in the Airflow UI:

  • redshift_default — your Redshift cluster details
  • google_sheets_default — path to your Service Account JSON

Step 5: Writing the DAG

Create a Python script in the dags/ folder defining four tasks:

  1. Load WTD metrics via SQLExecuteQueryOperator
  2. Load MTD metrics via SQLExecuteQueryOperator
  3. Union results via SQLExecuteQueryOperator
  4. Upload to Google Sheets via SQLToGoogleSheetsOperator

Schedule runs using a CRON expression. Note that Airflow schedules in UTC, so account for your timezone offset when setting the schedule.

Step 6: Running the DAG

  1. Upload your DAG script to the dags/ folder
  2. Restart Astronomer: astro dev restart
  3. Enable the DAG in the Airflow UI
  4. Trigger manually to test, or await the scheduled execution

Conclusion

That's it — a complete local Airflow environment automating a data pipeline from Redshift to Google Sheets. Once the DAG is running on a schedule, your reports update themselves. No more manual exports, no more stale spreadsheets.

Originally published on Medium →