[FEAT] Implement Native LLM Structured Outputs for Finding Intelligence

## Summary

Migrate the Vulnerability Enrichment and Intelligence Engine to use native LLM Structured Outputs instead of relying on prompt engineering and manual JSON parsing. This will guarantee that the AI-generated security findings strictly adhere to the required JSON schema, data types, and enumerations.

## Problem

Currently, the application extracts JSON responses from the LLM via prompt instructions (e.g., "Return ONLY JSON") and manual string parsing when enriching security findings. This is highly brittle:

* **Pipeline Instability:** A single malformed JSON property or missing bracket breaks downstream database ingestion and reporting.
* **Type Enforcement:** Numerical values like `cvss_score` are sometimes returned as strings (e.g., `"score: 8.5"`) instead of strict floats, breaking dashboard metrics.
* **Enum Hallucinations:** The LLM may hallucinate non-standard severity levels (e.g., `Severe` or `Warning` instead of `Critical`, `High`, `Medium`, etc.), which breaks dashboard filters and routing logic.
* **Collection Crashes:** Remediation steps might be returned as a single string instead of an array, causing client-side `.map()` errors on the frontend.

## Proposed solution

Leverage the native Structured Outputs feature (JSON Schema validation) supported by modern LLM APIs.

1. Define a strict schema (e.g., using a Pydantic model) for the enriched security finding payload, explicitly typing fields like `severity` (Enum), `cvss_score` (float), and `remediation_steps` (list of strings).
2. Pass this schema directly into the LLM provider's structured output parameter.
3. Remove legacy fallback logic, string manipulation, and markdown-stripping (````json`) currently used to salvage AI outputs.

## Suggested scope

* Suggested files or directories: `backend/secuscan/finding_intelligence.py` and potentially `backend/secuscan/models.py` for schema definitions.
* Related route, page, component, API, or plugin: The finding intelligence service/API layer handling vulnerability enrichment.

## Acceptance criteria

* [ ] A strict schema (Pydantic or standard JSON schema) is defined for the AI finding enrichment payload.
* [ ] The LLM API call in the intelligence module is updated to use the provider's native structured outputs parameter.
* [ ] All legacy string-stripping and manual JSON loading functions are removed from this flow.
* [ ] The `severity` field strictly returns expected enums, and `cvss_score` strictly returns a float.
* [ ] Existing unit and integration tests for finding enrichment pass successfully.

## Test plan

1. Trigger a scan using a plugin that feeds raw data into the finding intelligence module (e.g., a raw Nuclei or ZAP finding).
2. Verify that the resulting enriched data is correctly persisted to the database without parsing exceptions.
3. Inspect the API response to the frontend to ensure `cvss_score` is a native JSON number (not a string) and `remediation_steps` is a native JSON array.
4. Run the backend test suite (e.g., `pytest testing/backend/unit/test_finding_intelligence.py` if available) to ensure no regressions.

## Alternatives considered

* **Improved Regex/Parsing logic:** We considered writing more robust regex to extract JSON blocks, but this does not solve the issue of the LLM hallucinating incorrect property names or data types inside the block.
* **Using a validation library (like Guardrails AI):** While this enforces schemas, it adds unnecessary latency and an extra dependency, whereas native structured outputs solve the problem at the API level directly.

## Additional context

I am a contributor participating in GSSoC 2026. This architectural optimization will significantly improve the core stability of the SecuScan data pipeline, and I would love to be assigned to implement it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Implement Native LLM Structured Outputs for Finding Intelligence #643

Summary

Problem

Proposed solution

Suggested scope

Acceptance criteria

Test plan

Alternatives considered

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEAT] Implement Native LLM Structured Outputs for Finding Intelligence #643

Description

Summary

Problem

Proposed solution

Suggested scope

Acceptance criteria

Test plan

Alternatives considered

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions