See your data clearly.
Aniwa is an open-source universal dataset profiling and intelligence tool for developers, analysts, data engineers, researchers, and modern data teams.
Aniwa helps users quickly understand datasets through:
- schema profiling
- data quality analysis
- statistical summaries
- intelligent insights
- rich terminal reports
- shareable reports
- configurable profiling workflows
Whether you're working with CSV files, Excel spreadsheets, JSON datasets, or Parquet files, Aniwa provides a fast and elegant way to inspect, understand, and trust your data.
Full documentation available here: https://reginalderzoah.github.io/Aniwa/
v0.1.1
Modern data workflows constantly involve:
unknown datasets
Before trusting a dataset, teams need to answer questions like:
- What columns exist?
- What data types are present?
- Are there missing values?
- Are there duplicates?
- Are there suspicious patterns?
- Which columns may contain IDs or sensitive information?
- Is the dataset healthy?
Aniwa makes answering those questions:
fast, intelligent, and developer-friendly
Aniwa currently supports:
- CSV
- Excel (.xlsx/.xls)
- JSON
- Parquet
Future releases are planned to support:
- PostgreSQL
- MySQL
- DuckDB
- BigQuery
- Snowflake
Aniwa currently provides:
- row counts
- column counts
- dataset size analysis
- type inference
- schema overview
- mixed type detection
- null analysis
- duplicate detection
- uniqueness analysis
- sparse column detection
- minimum values
- maximum values
- mean
- median
- standard deviation
- possible ID detection
- high-cardinality warnings
- sparse column warnings
- suspicious quality patterns
Aniwa currently supports:
- Rich terminal reports
- JSON reports
- HTML reports
Upcoming releases are planned to include:
- Markdown reports
- Excel reports
- PDF reports
- charts
- report templates
Install Aniwa from PyPI:
pip install aniwaVerify installation:
aniwa --helpUpgrade Aniwa:
pip install --upgrade aniwaProfile a dataset:
aniwa profile customers.csvGenerate a JSON report:
aniwa profile customers.csv --report json --output profile.jsonGenerate an HTML report:
aniwa profile customers.csv --report html --output profile.htmlRun lightweight profiling:
aniwa profile customers.csv --mode fastRun full profiling:
aniwa profile customers.csv --mode deepAniwa supports configuration-driven workflows.
Supported config formats:
- YAML
- TOML
- JSON
Aniwa automatically searches for:
aniwa.yaml
aniwa.yml
aniwa.toml
aniwa.json
Example:
mode: deep
report:
format: html
output_dir: reports/
sections:
include:
- summary
- schema
- statistics
- insightsUse a custom config file:
aniwa profile customers.csv --config config.yamlAniwa supports configurable report sections.
Current sections include:
- summary
- schema
- quality
- statistics
- insights
Example:
aniwa profile customers.csv --include summary,statisticsExclude sections:
aniwa profile customers.csv --exclude statistics┌──────────────────────────────┐
│ Aniwa Dataset Profile │
├──────────────────────────────┤
│ Rows: 5 │
│ Columns: 5 │
│ Duplicate Rows: 1 │
└──────────────────────────────┘
Aniwa now includes a full documentation system.
Full documentation available here: https://reginalderzoah.github.io/Aniwa/
Documentation includes:
- getting started guides
- architecture documentation
- developer guides
- release notes
- roadmap
- philosophy
View documentation locally with MkDocs:
mkdocs serveBuild documentation:
mkdocs buildDocumentation structure:
docs/
├── index.md
├── roadmap.md
├── philosophy.md
├── getting-started/
├── developer-guide/
└── release-notes/
Clone the repository:
git clone https://github.com/ReginaldErzoah/Aniwa.git
cd AniwaCreate a virtual environment:
python -m venv .venvActivate the environment:
source .venv/Scripts/activatesource .venv/bin/activateInstall dependencies:
pip install -r requirements.txtInstall Aniwa locally:
pip install -e .Aniwa currently follows a modular layered architecture:
CLI
→ Configuration
→ Readers
→ Profiling Engine
→ Models
→ Reports
This architecture prioritizes:
- modularity
- maintainability
- scalability
- contributor friendliness
Aniwa/
│
├── aniwa/
│ ├── cli.py
│ ├── config/
│ ├── core/
│ ├── io/
│ ├── models/
│ ├── reports/
│ ├── templates/
│ └── utils/
│
├── docs/
├── tests/
├── examples/
│
├── README.md
├── CONTRIBUTING.md
├── SPRINT.md
├── mkdocs.yml
├── pyproject.toml
└── requirements.txt
- universal dataset profiling
- reporting systems
- configuration workflows
- modular architecture
- developer-first UX
Planned features:
- Better HTML with charts.
- Better report templates.
- Report modes and presets.
- Better output management.
- Metadata
- Incldue & exclude sections.
- Config file supports (yml, json & toml).
- Improved documentation.
Planned features:
- correlation analysis
- anomaly detection
- semantic profiling
- improved insights
Planned features:
- PostgreSQL support
- MySQL support
- DuckDB support
- BigQuery support
- profiling history
- snapshot management
Planned features:
- plugin system
- custom profiling modules
- community extensions
Planned features:
- dataset summarization
- semantic understanding
- AI-assisted recommendations
- anomaly explanations
Aniwa is built around several core principles:
- universal
- developer-first
- fast
- modular
- intelligent
- beautiful
- automation-friendly
The long-term goal is to build:
universal data intelligence infrastructure
For deeper architectural and ecosystem thinking, see:
docs/philosophy.md
docs/roadmap.md
Contributions are welcome.
See:
- CONTRIBUTING.md
- SPRINT.md
- docs/developer-guide/
for:
- local development
- testing workflows
- architecture guidance
- release workflows
- contributor standards
If you find this project useful:
- star the repository
- Share repo with friends
- Contribute to making this project better solving open issues
- Recommendations can also be sent to maintainer
Aniwa is released under the MIT License.
See LICENSE for details.
