Wikipedia word counter using Airflow and Spark — Part 1 — Airflow DAG
In this series, you will learn to use Airflow and Spark to create a simple word counter application for Wikipedia articles which can also be used to solve real-world ‘big-data’ problems such as:
1. Ingesting data from multiple sources to a staging area, and
2. Performing scheduled ETL (Extract, Transform and Load) to create curated datasets for analysis.
This article will guide you through a step-by-step process of creating this application.