industtry is a toolkit for industrial-style exploitation of
structured datasets (primarily data frames), with an emphasis on
set-level operations: applying the same procedure to many inputs
as long as they share an identified structure.
A large part of the package is built around two pragmatic ideas:
- “Collections first”: many workflows fail to scale because they
are designed for a single dataset;
industtrypushes common tasks to the collection level (importing, inspecting, detecting schema differences, batch conversion, etc.). - “Operational tooling”: lightweight helpers for day-to-day work (paths, duplicates, joins checks, string replacements, etc.) that tend to recur in production data pipelines.
Some features integrate tightly with RStudio (background jobs / interactive workflows). Those parts degrade gracefully when RStudio is not available (examples are marked accordingly).
You can install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("danielrak/industtry")The exported surface is intentionally broad; the table below is a functional map of the main user-facing tools.
| Area | Main intent | Key functions |
|---|---|---|
| Import collections | Load multiple datasets into the Global Environment (serialized or parallelized). | serial_import(), parallel_import() |
| Batch conversion / renaming | Operate through Excel masks to convert file formats or rename files at scale. | mask_convert_r(), convert_r(), mask_rename_r(), rename_r() |
| Inspection & profiling | Inspect one dataset or a whole folder of datasets; export diagnostics to Excel. | inspect(), inspect_write(), inspect_vars() |
| Schema detection / consistency | Detect variables across datasets and compare classes/structures. | vars_detect*(), vars_compclasses*(), chars_structure*(), detect_chars_structure*() |
| Data hygiene helpers | Duplicate diagnostics, join checks, proportions, etc. | dupl_show(), dupl_sources(), ljoin_checks(), table_prop() |
| Paths & filesystem | Replicate folder structures / move files. | folder_structure_replicate(), path_move() |
| Utilities | String replacement, global assignment, script location, etc. | replace_multiple(), assign_to_global(), current_script_location() |
A lot of analysis pipelines begin with import. When you have many datasets, importing them “one by one” (and keeping names consistent) becomes error-prone.
industtry provides:
serial_import()to import a set of datasets sequentially.parallel_import()to import a set of datasets in parallel (RStudio only).
library(industtry)
yourdir <- system.file("permadir_examples_and_tests/importations", package = "industtry")
lfiles <- list.files(yourdir, full.names = TRUE) %>%
purrr::keep(stringr::str_detect(., "\\.rds$"))
lfilesserial_import(lfiles){r, eval = FALSE} # RStudio only: parallel_import(lfiles)
When datasets come from heterogeneous sources (different producers, different time windows, different exports), a fast “schema + variable” diagnostic helps you converge quickly.
# built-in dataset as a simple example
data(cars)
out_dir <- tempdir()
inspect_write(
data_frame_name = "cars",
output_path = out_dir,
output_label = "cars"
)
list.files(out_dir)mydir <- file.path(tempdir(), "inspect_vars_readme_example")
dir.create(mydir, showWarnings = FALSE)
saveRDS(cars, file.path(mydir, "cars1.rds"))
saveRDS(mtcars, file.path(mydir, "cars2.rds"))
inspect_vars(
input_path = mydir,
output_path = mydir,
output_label = "cardata",
considered_extensions = "rds"
)
list.files(mydir)convert_r() is designed for batch conversion of dataset file
formats using an Excel mask: a deterministic, auditable interface to
define what to convert and how to name outputs.
High-level pattern:
- Create a mask with
mask_convert_r(). - Fill the mask columns.
- Run
convert_r(mask_filepath, output_path)(RStudio only).
mydir <- system.file("permadir_examples_and_tests/convert_r", package =
"industtry")
mask_convert_r(output_path = mydir)
convert_r(mask_filepath =
file.path(mydir, "mask_convert_r.xlsx"),
output_path = mydir)Similarly, rename_r() performs batch renaming based on an Excel
mask created by mask_rename_r().
mydir <- tempfile()
dir.create(mydir)
saveRDS(cars, file.path(mydir, "cars.rds"))
saveRDS(mtcars, file.path(mydir, "mtcars.rds"))
mask_rename_r(input_path = mydir)
list.files(mydir)
rename_r(mask_filepath = file.path(mydir, "mask_rename_r.xlsx"))