Re-identification risk assessment that computes k-anonymity, l-diversity, and HIPAA Safe Harbor compliance on a dataset.
Healthcare & Life-Sciences — HIPAA, PHI, FHIR/HL7, and clinical data.
pip install cognis-deidproof
deidproof scan . # → prioritized findings in seconds
-
Install the CLI:
pip install deidproof
-
Check a CSV dataset for re-identification risk, naming your quasi-identifier and sensitive columns:
deidproof check dataset.csv --quasi-identifiers zip,age,sex --sensitive diagnosis
-
Enforce thresholds — require a minimum k-anonymity and l-diversity:
deidproof check dataset.csv --quasi-identifiers zip,age,sex --sensitive diagnosis --min-k 5 --min-l 2
-
Read the output. Add
--format jsonfor machine-readable results:deidproof check dataset.csv --quasi-identifiers zip,age,sex --format json > risk.json -
Wire it into CI — block a data release that fails k/l targets (non-zero exit):
deidproof check dataset.csv --quasi-identifiers zip,age,sex --sensitive diagnosis --min-k 5 || exit 1
- Why deidproof? · Features · Quick start · Example · Architecture · AI stack · How it compares · Integrations · Install anywhere · Related · Contributing
Proves your 'de-identified' export actually is de-identified, emitting a signed risk report — the safety net researchers cite before publishing or sharing data.
deidproof is single-purpose, scriptable, and self-hostable: point it at a target, get prioritized results in the format your workflow already speaks (table · JSON · SARIF), gate CI on it, and let agents drive it over MCP.
-
✅ K Anonymity
-
✅ L Diversity
-
✅ Safe Harbor Scan
-
✅ Analyze Rows
-
✅ Analyze Csv
-
✅ Runs on Linux/macOS/Windows · Docker · devcontainer
-
✅ Ports in Python, JavaScript, Go, and Rust (
ports/)
pip install cognis-deidproof
deidproof --version
deidproof scan . # scan current project
deidproof scan . --format json # machine-readable
deidproof scan . --fail-on high # CI gate (non-zero exit)
$ deidproof scan .
[HIGH ] DEI-001 example finding (./src/app.py)
[MEDIUM ] DEI-002 another signal (./config.yaml)
2 findings · risk score 5 · 38ms
flowchart LR
IN[sources] --> P[deidproof<br/>curate + validate]
P --> OUT[query / analysis]
deidproof is interoperable with every popular way of using AI:
-
MCP server —
deidproof mcp(Claude Desktop, Cursor, Cognis.Studio, uncensored-fleet) -
OpenAI-compatible / JSON — pipe
deidproof scan . --format jsoninto any agent or LLM -
LangChain · CrewAI · AutoGen · LlamaIndex — wrap the CLI/JSON as a tool in one line
-
CI / scripts — exit codes + SARIF for non-AI pipelines
| | Cognis deidproof | ARX Data Anonymization Tool |
|---|:---:|:---:|
| Self-hostable, no account | ✅ | varies |
| Single command, zero config | ✅ |
| JSON + SARIF for CI | ✅ | varies |
| MCP-native (AI agents) | ✅ | ❌ |
| Polyglot ports (JS/Go/Rust) | ✅ | ❌ |
| Open license | ✅ COCL | varies |
Built in the spirit of ARX Data Anonymization Tool, re-framed the Cognis way. Missing a credit? Open a PR.
Pipes into your stack: SARIF for code-scanning, JSON for anything, an MCP server (deidproof mcp) for AI agents, and a webhook forwarder for SIEM/Slack/Jira. See docs/INTEGRATIONS.md.
pip install "git+https://github.com/cognis-digital/deidproof.git" # pip (works today)
pipx install "git+https://github.com/cognis-digital/deidproof.git" # isolated CLI
uv tool install "git+https://github.com/cognis-digital/deidproof.git" # uv
pip install cognis-deidproof # PyPI (when published)
docker run --rm ghcr.io/cognis-digital/deidproof:latest --help # Docker
brew install cognis-digital/tap/deidproof # Homebrew tap
curl -fsSL https://raw.githubusercontent.com/cognis-digital/deidproof/main/install.sh | sh
| Linux | macOS | Windows | Docker | Cloud |
|---|---|---|---|---|
| scripts/setup-linux.sh | scripts/setup-macos.sh | scripts/setup-windows.ps1 | docker run ghcr.io/cognis-digital/deidproof | DEPLOY.md (AWS/Azure/GCP/k8s) |
-
phiscrub— Stream-scan logs, CSVs, and free-text notes for PHI (names, MRNs, SSNs, dates, addresses) and redact or tokenize in place. -
dicomsweep— De-identify DICOM imaging studies per the DICOM PS3.15 Annex E profile, scrubbing tags and burned-in pixel text. -
fhirlint— Validate FHIR R4/R5 resources and bundles against profiles (US Core, etc.) with precise, line-level error reporting. -
hl7tap— Parse, pretty-print, diff, and replay HL7 v2 messages over MLLP from the terminal. -
consentledger— Maintain a tamper-evident, hash-chained audit log of patient-data access and consent events. -
synthcohort— Generate statistically realistic synthetic patient cohorts (FHIR/CSV) from a schema spec for dev and testing.
Explore the suite → 🗂️ all 170+ tools · ⭐ awesome-cognis · 🔗 cognis-sources · 🤖 uncensored-fleet · 🧠 engram
PRs, new rules, and demo scenarios are welcome under the collaboration-pull model — see CONTRIBUTING.md and SECURITY.md.
{} composes with the 300+ tool Cognis suite — JSON in/out and a shared
OpenAI-compatible /v1 backbone. See INTEROP.md for the
suite map, composition patterns, and reference stacks.
Source-available under the Cognis Open Collaboration License (COCL) v1.0 — free for personal, internal-evaluation, research, and educational use; commercial / production use requires a license (licensing@cognis.digital). See LICENSE.