Skip to content

arm.com disclaimer, performix docs, and new parser for developer.arm.com#83

Open
JoeStech wants to merge 3 commits into
mainfrom
arm-disclaimer
Open

arm.com disclaimer, performix docs, and new parser for developer.arm.com#83
JoeStech wants to merge 3 commits into
mainfrom
arm-disclaimer

Conversation

@JoeStech
Copy link
Copy Markdown
Member

@JoeStech JoeStech commented May 7, 2026

No description provided.

Copilot AI review requested due to automatic review settings May 7, 2026 22:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Arm.com-specific response disclaimers to the MCP knowledge-base search output, expands the embedding corpus to include Arm Performix documentation, and introduces a new ingestion/parser path for developer.arm.com/documentation/... by fetching and decoding the Arm Documentation Service JSON payloads.

Changes:

  • Add an Arm.com-domain disclaimer field to KB search results returned by the MCP server.
  • Add Arm Performix as a new vector DB source and corresponding retrieval evaluation questions.
  • Implement a new developer.arm.com documentation fetch/parse flow (service URL mapping + base64 HTML decode + chunking).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mcp-local/utils/kb_response.py New helper to detect Arm domains and inject a standard disclaimer into search results.
mcp-local/server.py Applies the disclaimer injection to knowledge_base_search() results.
embedding-generation/vector-db-sources.csv Adds Arm Performix documentation as a new source URL.
embedding-generation/tests/test_generate_chunks.py Removes the existing embedding-generation unit test suite.
embedding-generation/generate-chunks.py Adds Arm documentation chunk generation via the Arm documentation-service JSON API.
embedding-generation/eval_questions.json Adds new evaluation questions targeting Arm Performix pages.
embedding-generation/document_chunking.py Adds Arm developer/service URL mapping + JSON/base64 parsing for Arm documentation API responses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +409 to +423
data = json.loads(response_content.decode("utf-8", errors="ignore"))
topic = data.get("topic", data)
content = topic.get("content", "")
if not content:
return ParsedDocument(
source_url=source_url,
resolved_url=resolved_url,
display_title=fallback_title,
content_type="html",
sections=[],
)

html = base64.b64decode(content).decode("utf-8", errors="ignore")
title = data.get("title") or fallback_title
return parse_html(html, source_url, resolved_url, title)
Comment on lines +939 to +942
root_data = json.loads(root_response.content.decode("utf-8", errors="ignore"))
keywords = _arm_metadata_keywords(root_data, keywords_value, source_name)
document_title = root_data.get("title") or source_name
topic_links = _arm_topic_links(root_data.get("topic", {}))
Comment on lines 879 to 884
def create_chunks_for_source(source_url, source_name, doc_type, keywords_value):
if doc_type == "Ecosystem Dashboard":
return create_ecosystem_dashboard_chunk(source_url, source_name, keywords_value)
if is_arm_developer_documentation_url(source_url):
return create_arm_documentation_chunks(source_url, source_name, doc_type, keywords_value)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants