A Python tool that automatically builds graph representations and GraphRAGs from API documentation stored in CSV format.
API Graph Builder transforms CSV-based API documentation into queryable graph structures where:
- Nodes represent API tools and their JSON parameters
- Edges represent input/output relationships and data flows between APIs
- Semantic analysis (optional) identifies hidden connections using AI
- 📊 Parse CSV files containing API metadata (endpoints, payloads, responses)
- 🔍 Extract and flatten nested JSON structures from API inputs/outputs
- 🕸️ Build directed graphs showing API dependencies and parameter relationships
- 🔗 Detect potential API chains where one API's output feeds another's input
- 🤖 Use AI (Gemini) to find semantic relationships between parameters
- 💾 Export graphs in multiple formats (JSON, GraphML, DOT, Neo4j Cypher)
- 📈 Generate interactive visualizations for exploring API relationships
- Python 3.12+
- Virtual environment (recommended)
# Clone the repository
git clone <repository-url>
cd api-graph-builder
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Linux/Mac
# venv\Scripts\activate # On Windows
# Install dependencies
pip install -r requirements.txtParse CSV files and build a graph:
python -m src.main --input data/fms.csv data/hq.csv --output graph.jsonDetect potential data flows between APIs:
python -m src.main \
--input data/*.csv \
--output graph.json \
--detect-flows# Export as GraphML (for Gephi, Cytoscape)
python -m src.main --input data/*.csv --export graphml --output graph.graphml
# Export as DOT (for Graphviz)
python -m src.main --input data/*.csv --export dot --output graph.dot
# Export as Neo4j Cypher statements
python -m src.main --input data/*.csv --export neo4j --output graph.cypher--input, -i Input CSV file(s) (required)
--output, -o Output file path (default: graph.json)
--export, -e Export format: json, graphml, dot, neo4j (default: json)
--detect-flows Enable flow detection between APIs
--fuzzy-threshold Fuzzy matching threshold 0.0-1.0 (default: 0.8)
--verbose, -v Enable verbose logging
Input CSV files should contain the following columns:
tool_name- API operation nameapi_endpoint- API URLinput_payload- JSON string of input parametersoutput_response- JSON string of output data
status_code- HTTP status codesuccess- Boolean success flagcurl_command- Example curl commandtimestamp- When the API was documented
tool_name,api_endpoint,input_payload,output_response,status_code
get_user,/api/users,"{""user_id"": 1}","{""name"": ""John"", ""email"": ""john@example.com""}",200
send_email,/api/email,"{""email"": ""test@example.com"", ""subject"": ""Test""}","{""message_id"": ""123""}",200
create_order,/api/orders,"{""user_id"": 1, ""product_id"": 42}","{""order_id"": ""ORD-001"", ""status"": ""pending""}",201-
Tool Nodes - Represent API operations
- ID format:
{csv_filename}.{tool_name} - Example:
fms.get_all_transporters
- ID format:
-
Parameter Nodes - Represent JSON keys from inputs/outputs
- ID format: parameter name (e.g.,
user_id,email) - Nested keys use dot notation:
user.address.city - Array elements use brackets:
items[0].id
- ID format: parameter name (e.g.,
- requires_input - Tool → Input Parameter
- produces_output - Tool → Output Parameter
- potential_flow - Tool → Tool (via matching parameters)
- semantic_flow - Tool → Tool (via AI-detected semantic match)
Create an interactive HTML visualization with zoom, pan, and hover details:
# Full graph visualization (all nodes and edges)
python -m src.main \
--input data/*.csv \
--detect-flows \
--visualize \
--viz-output graph.html
# Flow-focused visualization (only API-to-API connections)
python -m src.main \
--input data/*.csv \
--detect-flows \
--flow-viz \
--viz-output flows.html- Interactive: Zoom, pan, and hover over nodes for details
- Color-coded nodes:
- 🔴 Red: API tools
- 🔵 Teal: Input parameters
- 🟢 Light teal: Output parameters
- Directional arrows: Show data flow direction
- Edge types:
- Light blue: requires_input
- Dark blue: produces_output
- Red: potential_flow (API chains)
- Orange: semantic_flow (AI-detected)
# Spring layout (default, force-directed)
--viz-layout spring
# Circular layout
--viz-layout circular
# Kamada-Kawai layout (energy-based)
--viz-layout kamada_kawaiAfter processing your API documentation:
Graph statistics: X nodes, Y edges
- N tool nodes (API operations)
- M parameter nodes
- K potential flows detected
Generated files:
flows_pyvis.html- Interactive flow visualizationgraph.json- Graph data in JSON format
Run the test suite:
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test file
pytest tests/test_json_extractor.py -v
# Run with verbose output
pytest -vapi-graph-builder/
├── data/ # CSV files with API documentation
├── src/ # Source code
│ ├── models.py # Data models
│ ├── csv_parser.py # CSV parsing
│ ├── json_extractor.py # JSON extraction
│ ├── graph_builder.py # Graph construction
│ ├── flow_detector.py # Flow detection
│ ├── export_engine.py # Export functionality
│ └── main.py # CLI entry point
├── tests/ # Test suite
├── requirements.txt # Dependencies
└── README.md # This file
To enable AI-powered semantic parameter matching:
-
Get a Gemini API key from Google AI Studio
-
Create a
.envfile:
GEMINI_API_KEY=your_api_key_here- Run with semantic analysis (feature coming soon):
python -m src.main --input data/*.csv --semantic-matchingImport the graph into Neo4j for advanced querying:
# Export to Cypher
python -m src.main --input data/*.csv --export neo4j --output graph.cypher
# Import into Neo4j
cat graph.cypher | cypher-shell -u neo4j -p passwordExample Neo4j queries:
// Find all tools that produce 'waybill_number' as output
MATCH (t:Tool)-[:produces_output]->(p:Parameter {name: 'waybill_number'})
RETURN t.name, t.api_endpoint
// Find API chains
MATCH (a:Tool)-[:potential_flow]->(b:Tool)
RETURN a.name, b.name
// Find all input parameters for a specific tool
MATCH (t:Tool {name: 'fms.get_all_transporters'})-[:requires_input]->(p:Parameter)
RETURN p.name# Type checking
mypy src/ --strict
# Run linter
ruff check src/
# Format code
black src/ tests/- Follow PEP 8 style guidelines
- Add type hints to all functions
- Write tests for new features
- Maintain 80%+ test coverage
See LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
Built with:
- NetworkX - Graph data structures
- pandas - CSV parsing
- PyVis - Interactive graph visualization
- Google Gemini - AI-powered semantic analysis
- pytest - Testing framework