Joining Data

Overview

The join block combines two datasets by matching rows on shared columns, the blockr equivalent of SQL joins or dplyr::left_join().

A join block has two input ports (unlike most blocks which have one). Connect one dataset to each port, then configure:

Join columns: which columns to match on. If the column names differ between datasets, you can map them explicitly.
Join type: how to handle non-matching rows:
- Left join: keep all rows from the left dataset, fill NA for non-matches from the right
- Right join: keep all rows from the right dataset
- Inner join: keep only rows that match in both datasets
- Full join: keep all rows from both datasets

Joins are essential when your data lives in multiple tables. Common scenarios:

The bind rows and bind columns blocks handle simpler cases where you just need to stack datasets vertically or horizontally.