Joining Data
Overview
The join block combines two datasets by matching rows on shared columns, the blockr equivalent of SQL joins or dplyr::left_join().
How it works
A join block has two input ports (unlike most blocks which have one). Connect one dataset to each port, then configure:
- Join columns: which columns to match on. If the column names differ between datasets, you can map them explicitly.
- Join type: how to handle non-matching rows:
- Left join: keep all rows from the left dataset, fill
NAfor non-matches from the right - Right join: keep all rows from the right dataset
- Inner join: keep only rows that match in both datasets
- Full join: keep all rows from both datasets
- Left join: keep all rows from the left dataset, fill
When to use joins
Joins are essential when your data lives in multiple tables. Common scenarios:
- Enriching a transaction table with customer details from a lookup table
- Combining measurements from different instruments on a shared sample ID
- Merging survey responses with demographic data
The bind rows and bind columns blocks handle simpler cases where you just need to stack datasets vertically or horizontally.
Next steps
- Visualising data: chart your joined results
- Block reference: Wrangling: full reference for all wrangling blocks