Wikipedia word counter using Airflow and Spark — Part 1 — Airflow DAG

In this series, you will learn to use Airflow and Spark to create a simple word counter application for Wikipedia articles which can also be used to solve real-world ‘big-data’ problems such as:

1. Ingesting data from multiple sources to a staging area, and

2. Performing scheduled ETL (Extract, Transform and Load) to create curated datasets for analysis.

This article will guide you through a step-by-step process of creating this application.