Find out why your keys don't match.
You ran a left join and lost 40% of your rows. dplyr says "many-to-many relationship." joinspy says 12 keys have trailing spaces, 8 differ only by case, and 3 contain invisible Unicode characters. Then it fixes them.
library(joinspy)
join_spy(orders, customers, by = "customer_id")
repaired <- join_repair(orders, customers, by = "customer_id")
suggest_repairs(join_spy(orders, customers, by = "customer_id"))Most join failures come down to string-level problems in keys:
"Alice"vs"Alice "(trailing space, invisible)"NYC"vs"nyc"(case)- Zero-width spaces, BOMs, non-breaking spaces that look like regular spaces but aren't
"Johansson"vs"Johannson"(one character off)- Empty strings matching each other but not
NA
R won't warn you about any of these. join_spy() catches them before the join runs.
join_spy() examines keys before the join:
orders <- data.frame(
id = c("A", "B ", "c", "D"),
amount = c(100, 200, 300, 400),
stringsAsFactors = FALSE
)
customers <- data.frame(
id = c("A", "B", "C", "E"),
name = c("Alice", "Bob", "Carol", "Eve"),
stringsAsFactors = FALSE
)
join_spy(orders, customers, by = "id")
#> -- Join Diagnostic Report --
#> Match rate (left): 25%
#>
#> Issues Detected:
#> ! "B " has trailing whitespace (would match "B")
#> ! "c" vs "C" — case mismatch
#> x "D" has no match in right tablejoin_repair() fixes the issues, or previews what it would change with dry_run = TRUE. suggest_repairs() prints the R code instead of running it.
join_repair(orders, customers, by = "id", dry_run = TRUE)
repaired <- join_repair(orders, customers, by = "id",
standardize_case = "upper")
suggest_repairs(join_spy(orders, customers, by = "id"))
#> x$id <- trimws(x$id)
#> x$id <- toupper(x$id)
#> y$id <- toupper(y$id)join_spy() also estimates result size for each join type:
report <- join_spy(orders, customers, by = "id")
report$expected_rows
#> inner_join: 1
#> left_join: 4
#> right_join: 4
#> full_join: 7join_explain() works after the join, on the result:
result <- merge(orders, customers, by = "id", all.x = TRUE)
join_explain(result, orders, customers, by = "id", type = "left")
#> Result has same row count as left table
#> ! 3 left key(s) have no match in right tablejoin_strict() performs the join and enforces the cardinality you declare. If the data has a 1:n relationship and you said 1:1, it errors before producing a result. The output row count is determined by the constraint, not discovered after the fact.
# Declare 1:1 — errors if any key is duplicated on either side
join_strict(orders, customers, by = "id", expect = "1:1")
# Declare 1:n — left keys must be unique, right may repeat
join_strict(products, line_items, by = "product_id", expect = "1:n")The four levels are "1:1", "1:n", "n:1", and "n:m". detect_cardinality() reports the actual relationship if you need to check first.
Join wrappers (left_join_spy(), inner_join_spy(), etc.) run diagnostics before joining and attach the report as an attribute. check_cartesian() flags many-to-many keys that would multiply your row count. analyze_join_chain() handles multi-step A-B-C sequences. All joins work with tibbles, data.tables, and plain data frames.
# Install from CRAN
install.packages("joinspy")
# Or install development version from GitHub
# install.packages("pak")
pak::pak("gcol33/joinspy")- Getting Started
- Why Your Keys Don't Match
- Common Join Issues
- Joins in Production
- Working with Backends
- Function Reference
| Package | Focus |
|---|---|
| dplyr 1.1+ | Cardinality checks via relationship argument |
| powerjoin | 12-level configurable checks, key preprocessing |
| joyn | Match-status reporting variable per row |
| tidylog | Logs row count changes after joins |
joinspy focuses on string-level key diagnostics: whitespace, case, encoding, typos, and type mismatches. It identifies which specific keys failed, why, and can fix them automatically.
"Software is like sex: it's better when it's free." — Linus Torvalds
I'm a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.
If this package saved you some time, buying me a coffee is a nice way to say thanks.
MIT (see the LICENSE.md file)
@software{joinspy,
author = {Colling, Gilles},
title = {joinspy: Diagnostic Tools for Data Frame Joins},
year = {2025},
url = {https://github.com/gcol33/joinspy}
}