Skip to content

Joining Data

Overview

The join block combines two datasets by matching rows on shared columns, the blockr equivalent of SQL joins or dplyr::left_join().

How it works

A join block has two input ports (unlike most blocks which have one). Connect one dataset to each port, then configure:

  1. Join columns: which columns to match on. If the column names differ between datasets, you can map them explicitly.
  2. Join type: how to handle non-matching rows:
    • Left join: keep all rows from the left dataset, fill NA for non-matches from the right
    • Right join: keep all rows from the right dataset
    • Inner join: keep only rows that match in both datasets
    • Full join: keep all rows from both datasets

When to use joins

Joins are essential when your data lives in multiple tables. Common scenarios:

  • Enriching a transaction table with customer details from a lookup table
  • Combining measurements from different instruments on a shared sample ID
  • Merging survey responses with demographic data

The bind rows and bind columns blocks handle simpler cases where you just need to stack datasets vertically or horizontally.

Next steps

Developed by cynkra and BMS