Synthesis Project -- AI-Assisted Hardware Design Analysis

An AI-assisted synthesis & hardware-security workflow prototype -- exploring how graph algorithms and LLM-style assistance can augment conventional EDA flows.

An open-source, local-first research prototype. Research-prototype quality, not production EDA.

Try it live: synthesisproject-5ax4oq8wquyjmdgp6z9rvy.streamlit.app -- pick a sample design (try array_mult16.v for the 1278-node visualization) or upload your own Verilog. Source: streamlit_app.py.

Why this matters

Modern synthesis, verification, and physical-design flows are predominantly script-heavy and commercial-tool-locked. Engineers spend significant time on tasks that are fundamentally graph problems on the netlist -- fanout exploration, cone tracing, combinational-loop detection, critical-path identification, FSM discovery -- yet day-to-day debugging is gated by proprietary GUIs and TCL.

This project is a research prototype asking two questions:

Can a local-first, open-source graph toolkit replicate the analytical core of commercial netlist analyzers using only NetworkX, PyVis, and Streamlit?
Where can AI-assistance plug into this flow -- explaining critical paths, summarizing fanout cones, suggesting buffer insertions, flagging hardware-security anti-patterns (unprotected reset chains, suspicious clock-gating)?

The repo answers #1 today (parser + graph algorithms + STA-lite + interactive viz) and is a scaffold for #2 (tools/demo2/ai_agent.py, tools/demo4/ai_agent.py).

Framing: this is positioned as an exploration into AI-assisted semiconductor design workflows, not a hackathon demo. It is research-prototype quality -- not production EDA -- and the limitations section is explicit about that.

Status

Sample designs live in samples/ -- five small textbook designs (adder, decoder, mux, traffic-light FSM, 3-stage pipeline) plus two scale-demo multipliers (array_mult8.v, array_mult16.v -- the 16x16 builds to 1278 nodes / 1985 edges). All are original public-domain designs written for this project -- zero third-party IP. Pre-generated interactive DAG visualizations live in outputs/. Bring your own Verilog too: any structural / gate-level .v works.

Adding ISCAS-85 / OpenCores designs as additional samples is on the roadmap.

Screenshots & demo

Scale demo -- full 16x16 array multiplier visualized as a DAG (1278 nodes / 1985 edges):

Close-up of gate-level nodes	Sequential / pipeline register graph

Color-coded nodes: `AND` gates (blue), `full_adder` instances (orange), signals (grey). Edges are net connections.	Diamond-shaped nodes are sequential elements (registers). Clock/reset trees fan out from `rst_n`.

Demo -- 6-second walk-through of the analyzer (download MP4):

All visualizations are produced by launchers/generate_sample_outputs.py and live in outputs/. Open any .html file in a browser for full pan / zoom / hover.

Architecture

flowchart LR
 subgraph IN[Inputs]
 V[Verilog RTL / Gate-level netlist]
 X[Sample XML / IR]
 end

 subgraph PARSE[Parsing Layer]
 P1[Regex-based Verilog parser]
 P2[Instance & net extractor]
 P3[Library-cell classifier]
 end

 subgraph GRAPH[Graph Model - NetworkX DiGraph]
 G1[Nodes: gates / regs / IO]
 G2[Edges: driver -> sink nets]
 G3[Attrs: cell type, fanout, clock domain]
 end

 subgraph ALGO[Analysis Algorithms]
 A1[Fanout cone - reverse BFS]
 A2[Critical path - longest path on DAG]
 A3[Combinational loops - Tarjan SCC]
 A4[Clock-domain propagation - DFS]
 A5[FSM detection - register graph patterns]
 A6[I/O dependency chains - topological sort]
 end

 subgraph AI[AI-Assistance Layer - prototype]
 AI1[Cone summarization]
 AI2[Anomaly detection - security patterns]
 AI3[Buffer-insertion suggestions]
 end

 subgraph UI[Visualization & UI - Streamlit + PyVis + Plotly]
 U1[Interactive 2D / 3D DAG]
 U2[Per-signal drill-down]
 U3[Critical-path heatmap]
 U4[Reports - HTML / JSON]
 end

 IN --> PARSE --> GRAPH --> ALGO --> UI
 GRAPH --> AI --> UI

The flow is intentionally graph-native end-to-end -- every analysis is a query against the same networkx.DiGraph, which is the most general representation of a netlist after elaboration. This mirrors how modern EDA research (e.g., GNN-based timing prediction, NVIDIA's recent work on graph learning for circuits) increasingly treats post-synthesis netlists as first-class graph objects rather than HDL text.

Algorithms -- Technical Depth

This is the engineering core. Each tool reuses the same networkx.DiGraph and composes the algorithms below.

1. Fanout-cone extraction -- reverse BFS

For a driver node $d$, the fanout cone is

$$\mathrm{Cone}(d) = { v \mid d \rightsquigarrow v \text{ in } G }$$

Implemented as a bounded-depth reverse BFS (networkx.descendants_at_distance) so visualization stays interactive even on 10k+ gate designs. Depth-limit is a UI knob -- engineers usually care about levels 1-4.

2. Critical path -- longest path on DAG

Once combinational loops are broken at sequential boundaries, the timing graph is a DAG. The longest path is computed in $O(|V|+|E|)$ via topological-sort + DP -- no exponential search needed. Edge weights are unit (gate-count) by default; a kind-keyed delay model lives in tools/sta_lite -- see below.

2b. STA-lite -- arrival / required / slack DP

tools/sta_lite extends longest-path into a textbook static-timing analyzer:

Tags each driver node with a delay keyed by gate kind (NAND=0.10ns, AND=0.12ns, OR=0.15ns, XOR=0.20ns, NOT=0.05ns, submodule=0.30ns, register=boundary).
Forward DP over topological order -> arrival[v] = max(arrival[u] + delay(u)).
Backward DP from primary outputs / register inputs -> required[u] = min(required[v] - delay(u)) against a target clock period.
Slack = required - arrival. Negative slack -> timing violation.
Critical-path backtrace + top-N slowest endpoints report.

Wired into the live Streamlit demo -- pick a sample, drag the clock-period slider, watch slack flip.

2c. Parallelism profile -- DAG levelization & Brent's-bound speedup (GL0AM-inspired)

tools/parallelism answers a different question on the same DAG: "if you ran this netlist on a parallel logic simulator, how much speedup could you possibly get?"

The approach is inspired by NVIDIA Research's GL0AM (GPU-Accelerated Gate-Level Logic Simulator, Zhang & Ren). GL0AM levelizes the netlist, schedules same-level gates onto GPU SMs in lock-step, and uses graph partitioning to minimize synchronization overhead. This module ports the analysis half of that idea to CPU Python:

Levelize the combinational DAG -- level[v] = 1 + max(level[u] for u in preds). All gates at the same level have no data dependency -> simulable in one parallel step.
Width-per-level histogram -- wide-and-shallow shape => lots of parallelism; tall-and-narrow => serial-bound.
Brent's bound -- $\text{speedup}{\max} = |V|;/;L{\max}$. Upper bound on parallel speedup for any parallel simulator on this design.
Partition count -- weakly-connected components in the register-cut graph. Each partition is an independent combinational cone; more partitions => easier GPU load-balancing.
Verdict -- coarse "is this design worth GPU-accelerating?" tag (trivial / serial-bound / moderate / GL0AM-regime).

Measured on this repo's samples: c17 is trivial (2.4x), c432 is moderate (9.2x, 36 gates per parallel step), c1908 lands in the GL0AM regime (14.4x, 38 gates wide), and c6288 (ISCAS-85 16x16 multiplier) tops the chart at 19.3x with 256 gates per parallel step -- exactly the size class where GPU acceleration starts paying off.

This is a research-prototype module; it identifies where GPU acceleration would be valuable, it does not perform GPU simulation itself.

2d. Hardware-security audit -- 5 heuristic rules over the DAG

tools/security is a static-analysis pass that surfaces five well-known hardware-security and design-integrity anti-patterns. All rules run in linear time over the same DAG that powers the rest of the suite -- no separate IR, no separate parse.

Rule	What it catches	Severity	Algorithm
`COMB_LOOP`	Non-trivial SCC in the combinational sub-graph -- a ring oscillator / latch loop	HIGH	Tarjan SCC
`RESET_GATING`	A `reset`/`rst` net reaches a register only after a combinational gate (data-dependent reset = fault-injection surface)	HIGH	Shortest-path reset->reg, count comb hops
`ASYNC_RESET_NO_SYNC`	A reset input drives a register directly with no 2-FF synchronizer chain (metastability / glitch attack surface)	MEDIUM	Successor-register check on the target reg
`DANGLING_LOGIC`	Combinational nodes with no forward path to any primary output or register (classic trojan hiding place)	LOW / MEDIUM	Ancestor-set complement; cone-size threshold
`MULTI_DRIVER`	Net with > 1 driver (X-prop / glitch / contention)	HIGH	In-degree check on net nodes

Findings are sorted HIGH -> LOW with a concrete suggestion: field on each one. The ruleset is heuristic -- false positives are expected; the value is in surfacing candidates for human review, not in formal proof. Reset detection uses a conservative name pattern (reset|rst|reset_n|rstn); for production use you would replace this with a proper port-attribute lookup.

Wired into the live demo as Section 6 with severity-coloured metrics and a sortable findings table.

3. Combinational-loop detection -- Tarjan's SCC

A combinational loop is a strongly connected component of size > 1 in the combinational sub-graph. Tarjan's algorithm finds all SCCs in $O(|V|+|E|)$. Each non-trivial SCC is reported with severity (CRITICAL / WARNING / INFO) based on cycle length and gate composition, with a suggestion of where to insert a register to break it.

4. Clock-domain propagation -- DFS with attribute tagging

Clock signals are seed-detected by regex (clk, clock, _ck, ...). A DFS from each clock source propagates the domain attribute through combinational nodes, halting at registers and IOs. This yields the CDC (clock-domain-crossing) candidate set for free -- any combinational node visited by two different domains is a CDC candidate.

5. FSM / pipeline detection -- register sub-graph patterns

The sub-graph induced by registers + their immediate combinational predecessors is matched against canonical FSM / pipeline templates (small strongly-connected register cliques -> FSM; long register chains -> pipeline). This is heuristic, not formal -- but it is fast and surfaces structural intent.

6. I/O dependency chains -- topological sort + path enumeration

For each primary input, a topological forward traversal yields all primary outputs influenced by it. The transitive-closure view exposes dead inputs, dead outputs, and maximum logic depth per output -- useful for both verification coverage and SoC-level timing budgeting. Detailed in docs/IO_CHAINS_FEATURE.md.

7. AI-assistance hooks (prototype)

tools/demo2/ai_agent.py and tools/demo4/ai_agent.py provide an entry-point for LLM-driven assistance over the graph:

Cone summarization -- natural-language explanation of "why is signal X high?" given the local fanin sub-graph.
Anomaly detection -- pattern matching for hardware-security smells (unprotected resets, gated clocks without enable balancing, scan chains leaking into functional paths).
Buffer-insertion suggestions -- heuristics over fanout x estimated load.

These are stubs today; the data model is what makes them tractable.

Benchmarks

End-to-end results on the bundled sample designs (single-thread, Python 3.12, no caching). Parse + graph build + full STA-lite sweep, on commodity laptop hardware:

Sample	Source	Gates	Nodes	Edges	Critical path	Worst slack @ 1ns
`c17.v`	ISCAS-85	6	17	18	7	+0.700 ns (MET)
`decoder2to4.v`	textbook	6	15	20	5	+0.830 ns (MET)
`mux4to1.v`	textbook	7	20	25	7	+0.680 ns (MET)
`adder4.v`	textbook	5	34	37	9	-0.200 ns (4-bit ripple-carry)
`pipeline3.v`	textbook	0+6 reg	12	20	1	+1.000 ns (MET)
`fsm_traffic.v`	textbook	0+2 reg	9	10	1	+1.000 ns (MET)
`c432.v`	ISCAS-85	160	378	518	41	-1.210 ns at `N421`
`c1908.v`	ISCAS-85	479	991	1465	69	-2.000 ns (16-bit SEC)
`c6288.v`	ISCAS-85	2353	4738	7043	245	-17.070 ns (16x16 multiplier)
`array_mult8.v`	generated	320	326	489	43	-5.120 ns at `s_7_7`
`array_mult16.v`	generated	1450	1278	1985	91	-12.320 ns at `s_15_15`
Total CI runtime						< 5 s for full test suite

Parallelism profile (GL0AM-inspired)

Same DAG, different question -- "what's the upper bound on parallel-simulation speedup?"

Sample	Nodes	Critical path	Max parallel width	Theoretical speedup	Verdict
`c17.v`	17	7	5	2.43x	trivial - parallelism moot
`decoder2to4.v`	15	5	3	3.00x	trivial
`mux4to1.v`	20	7	6	2.86x	trivial
`adder4.v`	34	9	12	3.78x	trivial
`c432.v` (ISCAS-85)	378	41	36	9.22x	moderate - some speedup possible
`c1908.v` (ISCAS-85)	991	69	38	14.36x	excellent - good GPU candidate (GL0AM regime)
`c6288.v` (ISCAS-85)	4738	245	256	19.34x	excellent - good GPU candidate (GL0AM regime)
`array_mult8.v`	326	43	68	7.58x	moderate
`array_mult16.v`	1278	91	260	14.04x	excellent - good GPU candidate (GL0AM regime)

The headline: as netlists scale, available parallelism grows faster than the critical path -- exactly the economic case for GPU-accelerated logic simulation. The array_mult16 design exposes 260 gates evaluable in one parallel step at its widest level, which is comfortably in the regime where GL0AM-style scheduling pays off.

Worst-slack numbers use the illustrative delay model in tools/sta_lite/; they are not silicon-accurate but they are reproducible and they correctly track design complexity (16x16 multiplier critical path is 91 gates deep vs. adder's 9).

Reproduce locally:

pip install -e ".[dev]"
synthesis-generate-outputs    # regenerates outputs/*.html
pytest -v                      # 24 tests, < 5s

Tooling

Seven tools, all built on the same graph core:

#	Tool	Folder	Purpose
1	Hardware Debug Assistant	`tools/demo3`	Signal tracing, fanout analysis, buffer planning
2	Netlist Analyzer	`tools/demo2`	Interactive DAG + critical-path analysis + AI hooks
3	RTL Analyzer	`tools/rtl_analyzer`	Behavioral RTL analysis, FSM / pipeline detection
4	Advanced Debugger	`tools/final_debugger`	Comprehensive debugger combining all algorithms
5	DAG Visualizer	`tools/dag_visualizer`	Standalone interactive DAG generator
6	Verilog DAG Tool	`tools/demo4`	Enhanced netlist analyzer variant
7	Early prototype	`tools/demo1`	Original Verilog DAG / cone-tracing prototype

Repository Layout

.
|-- README.md # you are here
|-- LICENSE # MIT
|-- requirements.txt # Python dependencies for all tools
|-- .gitignore
|
|-- tools/ # all analyzer apps (Streamlit / Python)
| |-- demo1/ # early Verilog DAG prototype
| |-- demo2/ # Netlist Analyzer (port 8620)
| |-- demo3/ # Hardware Debug Assistant (port 8610)
| |-- demo4/ # Netlist Analyzer variant (port 8550)
| |-- rtl_analyzer/ # RTL Analyzer (port 8630)
| |-- final_debugger/ # Advanced Debugger (port 8640)
| \-- dag_visualizer/ # Standalone 3D DAG gen (port 8650)
|
|-- launchers/ # convenience scripts
|-- samples/ # textbook Verilog designs (see samples/README.md)
|-- outputs/ # pre-generated interactive DAG visualizations
|-- docs/ # extended documentation
\-- lib/ # PyVis static assets used by visualizations

Quick Start

# Clone
git clone https://github.com/MaxTern-cyber/Synthesis_project.git
cd Synthesis_project

# Set up environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

# Launch interactive menu
python launchers/PRESENTATION_LAUNCHER.py

Install as a package (editable)

Prefer a proper Python package over loose scripts? The project ships a PEP-621 pyproject.toml, so you can install it in editable mode and get console entry points on your PATH:

pip install -e .                 # core install
pip install -e ".[dev,viz,export]"  # everything (tests, plots, Excel export)

# Installed console scripts:
synthesis-generate-outputs       # rebuild outputs/ for every sample
synthesis-generate-mult 24       # generate samples/array_mult24.v
synthesis-fix-unicode            # normalize stray unicode to ASCII

Launch a single tool

Tool	Command	URL
Hardware Debug Assistant	`streamlit run tools/demo3/debug_assistant.py --server.port 8610`	http://localhost:8610
Netlist Analyzer	`streamlit run tools/demo2/local_analyzer.py --server.port 8620`	http://localhost:8620
RTL Analyzer	`streamlit run tools/rtl_analyzer/rtl_analyzer.py --server.port 8630`	http://localhost:8630
Advanced Debugger	`streamlit run tools/final_debugger/advanced_debugger.py --server.port 8640`	http://localhost:8640
DAG Visualizer	`streamlit run tools/dag_visualizer/dag_visualizer.py --server.port 8650`	http://localhost:8650

Sample inputs: seven textbook designs ship in samples/ -- try samples/adder4.v (critical path), samples/decoder2to4.v (fanout), samples/fsm_traffic.v (FSM detection), or samples/array_mult16.v (16x16 multiplier, ~1280 graph nodes -- scale demo). Pre-built interactive visualizations for each are in outputs/. Any other structural / gate-level Verilog .v works too.

Regenerate sample outputs

python launchers/generate_sample_outputs.py

Generate a larger multiplier on demand

python samples/generate_array_mult.py 24 # 24x24 -> ~3300 primitives

Design Decisions

Decision	Rationale
NetworkX `DiGraph` as the single source of truth	Decouples parsing from analysis; every algorithm is a graph query. Trivially swappable for `igraph` / `graph-tool` if performance demands.
Regex parser, not a full SystemVerilog frontend	Targeted at post-synthesis structural Verilog -- the format most relevant for analysis. Frees the project from an antlr/pyverilog dependency-tree and licensing concerns.
Streamlit + PyVis instead of Qt/Tk	Browser-based UI is friction-free for engineers, deploys to Streamlit Cloud with one click, and matches how modern EDA dashboards are evolving.
Local-first, no cloud / no API keys	Designs are IP. Anything that ships RTL or post-synth netlists to a third-party endpoint is a non-starter inside chip companies. AI hooks are designed to plug into local model backends.
One folder per "demo" tool	Each tool is an isolated Streamlit app that can be run/deployed/forked independently.

Limitations & Honest Disclaimers

This is a research prototype. Calling out what it is not is part of taking it seriously.

STA-lite uses illustrative delays, not silicon-accurate ones. tools/sta_lite ships a kind-keyed unit-delay model (NAND=0.10ns, AND=0.12ns, ...) sufficient to demonstrate arrival/required/slack propagation. A real flow would consume a .lib Liberty file -- which is on the roadmap.
No SDC / constraints parsing. Clock period is a UI slider, not a file input.
Parser is structural-Verilog-only. Generate-blocks, parameterised modules, and full SystemVerilog constructs are out of scope.
AI-assistance hooks are scaffolds, not production agents. They demonstrate where an LLM fits -- model selection (Llama-3, Phi-3, local Ollama) is intentionally pluggable.
No DRC / LVS / physical-design checks -- this is netlist analysis, not signoff.
Tested on small-to-medium designs (<= ~10K gates). Scaling to multi-million-gate designs would require swapping NetworkX for a C++ graph backend (igraph, graph-tool).
Not a replacement for commercial tooling (Conformal, PrimeTime, Genus, DC). The point is to demonstrate that the analytical core is reproducible with open primitives, not to compete on signoff quality.

Roadmap

Shipped:

CI -- pytest suite (24 tests) + GitHub Actions + mypy type-checking on every push
Streamlit Cloud deployment -- public live demo
Delay-aware STA-lite -- arrival / required / slack DP with kind-keyed delays (tools/sta_lite/)
ISCAS-85 benchmarks -- c17, c432 (160 gates), c1908 (479 gates, 16-bit SEC), c6288 (2353 gates, 16x16 multiplier) as canonical academic samples
PEP-621 packaging -- pip install -e . with console entry points

Open (see issues):

Local-LLM agent (#3) -- Ollama-backed cone summarization (Llama-3.2-3B-instruct, Phi-3-mini)
GNN inference experiment (#4) -- predict critical-path location from structural features
Hardware-security ruleset -- codified anti-patterns for clock-gating, reset trees, scan-chain isolation
GraphML / DEF export -- interoperate with OpenROAD / Yosys / open-source PD flows
Larger ISCAS-85 designs (c1908, c6288 -- 2400 gates)

Future research directions (open to collaboration):

LLM-assisted RTL debugging (explain a failing path in natural language)
Equivalence-aware optimization hints
Formal-verification integration (Conformal-style LEC pre-checks)
Intelligent synthesis-recommendation systems (predict good compile_ultra settings from netlist features)
AI-guided timing/power tradeoff exploration

Contributions and ideas welcome -- see CONTRIBUTING.md.

Documentation

Doc	Description
docs/DOCUMENTATION.md	Architecture, algorithms, internal API
docs/USAGE_GUIDE.md	Step-by-step usage examples
docs/QUICK_START.md	Fastest path to a running demo
docs/IO_CHAINS_FEATURE.md	I/O dependency-chain feature
docs/ENHANCEMENTS_SUMMARY.md	Feature additions
docs/LINKEDIN_POST.md	Draft LinkedIn announcement
docs/BLOG_POST.md	Long-form technical write-up
CREDITS.md	Sample-file provenance, third-party libraries, and academic references

Credits & references

All sample designs are original public-domain textbook circuits written for this project. All third-party libraries are open source under permissive licenses (BSD / MIT / Apache-2.0). See CREDITS.md for the full attribution table, library license list, and academic references underlying each algorithm.

Author

Mallikarjuna A L -- EDA engineer building open-source tools for hardware verification.

If you work in EDA, formal verification, synthesis, hardware security, or AI-for-chip-design -- I'd love to talk.

License

MIT -- use freely, attribution appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.devcontainer		.devcontainer
.github		.github
.streamlit		.streamlit
docs		docs
launchers		launchers
lib		lib
outputs		outputs
samples		samples
tests		tests
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
CREDITS.md		CREDITS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

Synthesis Project -- AI-Assisted Hardware Design Analysis

Why this matters

Status

Screenshots & demo

Architecture

Algorithms -- Technical Depth

1. Fanout-cone extraction -- reverse BFS

2. Critical path -- longest path on DAG

2b. STA-lite -- arrival / required / slack DP

2c. Parallelism profile -- DAG levelization & Brent's-bound speedup (GL0AM-inspired)

2d. Hardware-security audit -- 5 heuristic rules over the DAG

3. Combinational-loop detection -- Tarjan's SCC

4. Clock-domain propagation -- DFS with attribute tagging

5. FSM / pipeline detection -- register sub-graph patterns

6. I/O dependency chains -- topological sort + path enumeration

7. AI-assistance hooks (prototype)

Benchmarks

Parallelism profile (GL0AM-inspired)

Tooling

Repository Layout

Quick Start

Install as a package (editable)

Launch a single tool

Regenerate sample outputs

Generate a larger multiplier on demand

Design Decisions

Limitations & Honest Disclaimers

Roadmap

Documentation

Credits & references

Author

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages