Skip to content

Fix issues in langfuse observability#468

Open
nuwangeek wants to merge 8 commits into
buerokratt:wipfrom
rootcodelabs:llm/langfuse-observability
Open

Fix issues in langfuse observability#468
nuwangeek wants to merge 8 commits into
buerokratt:wipfrom
rootcodelabs:llm/langfuse-observability

Conversation

@nuwangeek

Copy link
Copy Markdown
Collaborator

This pull request introduces several important changes to how LLM connection identifiers are handled across the codebase, shifting from using internal database IDs to using vault_uuid for better consistency and external reference. It also adds new Ruuter endpoints for storing inference results in both production and testing environments, and enhances observability and usage tracking for LLM calls, especially in the NeMo Guardrails integration. Additionally, there are improvements to streaming error handling and code quality configurations.

Key changes:

LLM Connection Identifier Refactor

  • Updated all relevant SQL queries and API interfaces to use vault_uuid instead of the internal id for LLM connection identification. This affects updating connection status, budget usage, and storing inference results. (DSL/Resql/rag-search/POST/deactivate-llm-connection-budget-exceed.sql, DSL/Resql/rag-search/POST/update-llm-connection-used-budget.sql, DSL/Resql/rag-search/POST/store-inference-result.sql, DSL/Resql/rag-search/POST/store-testing-inference-result.sql, DSL/Ruuter.public/rag-search/POST/inference/results/store.yml) [1] [2] [3] [4] [5] [6] [7]

New Ruuter Endpoints for Inference Result Storage

  • Added a new private Ruuter endpoint for storing comprehensive production inference results, including chat context, refined questions, and chunk metadata. (DSL/Ruuter.private/rag-search/POST/inference/results/production/store.yml)
  • Added a new private Ruuter endpoint for storing minimal testing inference results, supporting the new vault_uuid identifier. (DSL/Ruuter.private/rag-search/POST/inference/results/test/store.yml)

Observability and Usage Tracking Enhancements

  • Integrated Langfuse's observe decorator and improved usage tracking in the NeMo Guardrails LLM adapter, capturing prompt/response previews, usage stats, and metadata for both sync and async calls. (src/guardrails/dspy_nemo_adapter.py, src/utils/observation_utils.py) [1] [2] [3] [4] [5] [6] [7] [8]
  • Ensured DSPy is configured with usage tracking enabled in the LLM manager. (src/llm_orchestrator_config/llm_manager.py) [1] [2]

Streaming and Error Handling Improvements

  • Wrapped streaming orchestration responses with observation context for better traceability, and improved error logging and SSE error message delivery during streaming. (src/llm_orchestration_service_api.py) [1] [2] [3] [4]

Code Quality and Minor Improvements

  • Added code quality exemptions for a utility file and exposed connection_id as a property in the LLM manager for easier access. (pyproject.toml, src/llm_orchestrator_config/llm_manager.py) [1] [2]

These changes collectively improve the maintainability, observability, and consistency of LLM connection management and result tracking throughout the system.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Langfuse observability and inference/budget tracking by standardizing on vault_uuid (instead of internal DB IDs), adding reusable observation helpers for streaming/non-streaming flows, and expanding usage/cost capture across tool-classifier, guardrails, and response generation.

Changes:

  • Refactor LLM connection identification to use vault_uuid across inference result storage and budget updates (incl. Resql + Ruuter DSL updates).
  • Add src/utils/observation_utils.py and integrate Langfuse observation updates across streaming + non-streaming LLM call sites.
  • Improve streaming pipeline handling (budget updates based on DSPy history deltas; unified streaming inference storage hooks across workflows).

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/utils/production_store.py Switch inference storage payload from llm_connection_id to vault_uuid; remove connection ID fetcher usage.
src/utils/observation_utils.py New canonical helpers for safe Langfuse observation context + updates (streaming/non-streaming patterns).
src/utils/cost_utils.py Add streaming-safe cost extraction fallbacks (including token-based estimation fallback).
src/utils/budget_tracker.py Update budget tracking to post vault_uuid to Resql update endpoint.
src/tool_classifier/workflows/service_workflow.py Add Langfuse @observe spans and attach observation metadata/usage to workflow steps.
src/tool_classifier/workflows/context_workflow.py Add Langfuse observation updates and store streaming inference outputs for context workflow.
src/tool_classifier/workflows/api_tool_workflow.py Ensure final streamed answer (incl. guardrail-blocked output) is what gets stored as inference.
src/tool_classifier/param_extractor.py Add Langfuse generation tracing and usage capture for param extraction (sync + streaming).
src/tool_classifier/multi_response_formatter.py Add Langfuse generation tracing + safe observation context for streaming formatter.
src/tool_classifier/intent_detector.py Add Langfuse generation tracing + usage capture for service intent detection LLM calls.
src/tool_classifier/intent_decomposer.py Add safe observation context + generation updates for async decomposition step.
src/tool_classifier/context_analyzer.py Add Langfuse tracing across context detection/summary/streaming generation paths.
src/tool_classifier/classifier.py Pass shared context through streaming fallback chain to preserve metadata.
src/tool_classifier/api_semantic_searcher.py Wrap disambiguation with safe observation context and usage capture; add result wrapper for DSPy callback compatibility.
src/tool_classifier/api_response_formatter.py Add Langfuse tracing + safe observation context for streaming response formatter.
src/response_generator/response_generate.py Add Langfuse tracing for streaming/non-streaming response generation and quick scope checks.
src/llm_orchestrator_config/llm_manager.py Expose connection_id property (vault UUID) and enable DSPy usage tracking (track_usage=True).
src/llm_orchestration_service.py Cache retrievers, improve streaming budget updates via DSPy history deltas, and unify streaming inference storage calls.
src/llm_orchestration_service_api.py Wrap streaming endpoint with safe Langfuse observation context; improve error logging context.
src/guardrails/dspy_nemo_adapter.py Add Langfuse generation tracing + usage capture for NeMo guardrails LLM calls (sync + async).
pyproject.toml Add Ruff ANN401 exemption for the new observation utils helper module.
DSL/Ruuter.public/rag-search/POST/inference/results/store.yml Replace llm_connection_id with vault_uuid in public inference storage contract.
DSL/Ruuter.private/rag-search/POST/inference/results/test/store.yml Add private testing inference storage endpoint using vault_uuid.
DSL/Ruuter.private/rag-search/POST/inference/results/production/store.yml Add private production inference storage endpoint (comprehensive payload).
DSL/Resql/rag-search/POST/update-llm-connection-used-budget.sql Update budget update query to target connections by vault_uuid.
DSL/Resql/rag-search/POST/store-testing-inference-result.sql Store testing inference results by resolving DB connection id via vault_uuid.
DSL/Resql/rag-search/POST/store-inference-result.sql Store inference results by resolving DB connection id via vault_uuid.
DSL/Resql/rag-search/POST/deactivate-llm-connection-budget-exceed.sql Update deactivation query to target connections by vault_uuid.

Comment thread src/tool_classifier/workflows/service_workflow.py Outdated
Comment thread DSL/Resql/rag-search/POST/update-llm-connection-used-budget.sql
Comment thread DSL/Ruuter.private/rag-search/POST/inference/results/test/store.yml Outdated
@nuwangeek nuwangeek marked this pull request as ready for review June 10, 2026 06:11
@nuwangeek nuwangeek requested a review from Thirunayan22 June 10, 2026 06:11
@nuwangeek nuwangeek linked an issue Jun 10, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Langfuse observability with newly updated features

2 participants