Skip to content

fix(skills): replace binary PPTX fixture with programmatic generator#2081

Open
PratikWayase wants to merge 4 commits into
microsoft:mainfrom
PratikWayase:fix/pptx-fixture-generator
Open

fix(skills): replace binary PPTX fixture with programmatic generator#2081
PratikWayase wants to merge 4 commits into
microsoft:mainfrom
PratikWayase:fix/pptx-fixture-generator

Conversation

@PratikWayase

Copy link
Copy Markdown
Contributor

Pull Request

Description

This PR resolves the security and auditability concerns raised in #1135 regarding the committed binary minimal_test_fixture.pptx.

Changes Made:

  1. Removed Opaque Binary: Deleted tests/fixtures/minimal_test_fixture.pptx to eliminate the risk of hidden macros, OLE objects, or malicious XML embedded in an unreviewable binary blob.
  2. Programmatic Fixture Generation: Added logic in conftest.py to dynamically generate the minimal PPTX fixture at test runtime using python-pptx. Used lxml to manipulate the theme XML directly to ensure deterministic theme color resolution (dk1, accent1).
  3. Roundtrip Validation: Added a new integration test test_generated_fixture_passes_validate_deck in test_extract_content_integration.py. This ensures the programmatically generated PPTX passes structural validation via validate_deck.py before extraction tests run, creating a closed-loop guarantee.

Benefits:

  • Security: No more opaque binary blobs in the repository.
  • Auditability: The exact contents of the test fixture are now fully visible and reviewable as standard Python code.
  • Maintainability: The fixture can be easily updated in the future without needing external tools to regenerate a .pptx file.

Related Issue(s)

Fixes #1135

Type of Change

Select all that apply:

Code & Documentation:

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality)
  • Breaking change (fix or feature causing existing functionality to change)
  • Documentation update

Infrastructure & Configuration:

  • GitHub Actions workflow
  • Linting configuration (markdown, PowerShell, etc.)
  • Security configuration (Removes opaque binary blob from repository)
  • DevContainer configuration
  • Dependency update

AI Artifacts:

  • Reviewed contribution with prompt-builder agent and addressed all feedback
  • Copilot instructions (.github/instructions/*.instructions.md)
  • Copilot prompt (.github/prompts/*.prompt.md)
  • Copilot agent (.github/agents/*.agent.md)
  • Copilot skill (.github/skills/*/SKILL.md)

Other:

  • Script/automation (.ps1, .sh, .py)
  • Other (please describe): Test infrastructure / Pytest fixtures

Testing

  • Ran the full Python test suite locally using npm run test:py.
  • Verified that all 758 PowerPoint tests pass, including the new test_generated_fixture_passes_validate_deck roundtrip validation test.
  • Confirmed that the extraction tests correctly resolve the dynamically generated theme colors and metadata.
  • (Note: The tts-voiceover test failures observed in the CI logs are pre-existing argparse issues in a completely separate skill and are unrelated to this PR).

Checklist

Required Checks

  • Documentation is updated (if applicable)
  • Files follow existing naming conventions
  • Changes are backwards compatible (if applicable)
  • Tests added for new functionality (if applicable)

Required Automated Checks

The following validation commands must pass before merging:

  • Markdown linting: npm run lint:md
  • Spell checking: npm run spell-check
  • Frontmatter validation: npm run lint:frontmatter
  • Skill structure validation: npm run validate:skills
  • Link validation: npm run lint:md-links
  • PowerShell analysis: npm run lint:ps
  • Plugin freshness: npm run plugin:generate
  • Docusaurus tests: npm run docs:test

Security Considerations

  • This PR does not contain any sensitive or NDA information
  • Any new dependencies have been reviewed for security issues (No new dependencies added; utilizes existing python-pptx and lxml)
  • Security-related scripts follow the principle of least privilege

@PratikWayase PratikWayase requested a review from a team as a code owner June 19, 2026 14:49
Comment thread .github/skills/experimental/powerpoint/tests/test_extract_content_integration.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/test_extract_content_integration.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
@codecov-commenter

codecov-commenter commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.25%. Comparing base (8fc30bc) to head (da5f569).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2081      +/-   ##
==========================================
+ Coverage   81.61%   87.25%   +5.64%     
==========================================
  Files         130       86      -44     
  Lines       19489    11554    -7935     
  Branches       12        0      -12     
==========================================
- Hits        15906    10082    -5824     
+ Misses       3580     1472    -2108     
+ Partials        3        0       -3     
Flag Coverage Δ
docusaurus ?
pester 85.93% <ø> (-0.02%) ⬇️
pytest 90.49% <ø> (+11.90%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 46 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@katriendg katriendg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — replacing the opaque binary .pptx with a transparent, code-reviewable generator is a real security and auditability win, and the roundtrip validate_deck test is a nice closed-loop guarantee.
Note I reviewed the current version of the branch, though due to a merge conflict some of the comments may be stale. Please rebase and re-run the CI checks before merge.

Before merge, a few items: the two modified Python files currently fail the repo's ruff lint gate (npm run lint:py) with 13 errors — 9 trailing-whitespace blank lines, 2 lines over 88 chars, and 2 unsorted import blocks. Most are auto-fixable with ruff check --fix / ruff format. Beyond lint, please add type hints to the new helper functions, mark the new roundtrip test with @pytest.mark.integration (matching its sibling tests) and give it AAA structure, and reconsider the direct lxml import (it's only a transitive dep of python-pptx) and the author = "ChatGPT" fixture value. Details are in the inline review.

@katriendg katriendg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline findings from the code review (the overall verdict remains Request changes from my earlier review). Each comment maps to a finding in the review summary.

Comment thread .github/skills/experimental/powerpoint/tests/conftest.py
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/conftest.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/test_extract_content_integration.py Outdated
Comment thread .github/skills/experimental/powerpoint/tests/test_extract_content_integration.py Outdated
@PratikWayase PratikWayase force-pushed the fix/pptx-fixture-generator branch from b9ab10b to e23cca5 Compare June 27, 2026 14:49
@PratikWayase PratikWayase force-pushed the fix/pptx-fixture-generator branch from e23cca5 to 2fad6cf Compare June 28, 2026 03:59

@katriendg katriendg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PratikWayase for addressing the comments in this PR. Leaving one small comment which would be a nice one to add.
Also please ensure you rebase and then looking good.

Comment thread .github/skills/experimental/powerpoint/tests/conftest.py
@katriendg katriendg requested a review from jkim323 July 1, 2026 18:22
WilliamBerryiii and others added 2 commits July 1, 2026 15:27
- Read theme XML via etree.fromstring(theme_part.blob); a base python-pptx Part has no .element, which crashed minimal_test_fixture generation.

- Restore malformed_pdf_dir, minimal_valid_pdf, oversized_pdf, and many_page_pdf fixtures dropped during the conftest rewrite; test_pdf_safety, test_export_slides, test_export_svg, and test_render_pdf_images depend on them.

- Declare lxml as a direct dependency (previously transitive via python-pptx).
@agreaves-ms agreaves-ms changed the title Fix : replace binary PPTX fixture with programmatic generator fix(skills): replace binary PPTX fixture with programmatic generator Jul 1, 2026
@WilliamBerryiii WilliamBerryiii dismissed katriendg’s stale review July 1, 2026 22:56

addressed all feedback

@auyidi1 auyidi1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New findings only. Excludes items already raised by katriendg/jkim323 and already addressed:

  • ruff lint / trailing whitespace / line length / import ordering
  • type hints on new helper functions
  • @pytest.mark.integration marker on the roundtrip test
  • author = "ChatGPT"HVE Core Test Fixture
  • direct lxml dependency thread (resolved in current head by adding lxml to
    pyproject.toml + uv.lock)

Each finding was verified by reproducing the generator and running validate_deck against the output.

Findings

1. [Medium] Roundtrip guarantee is weaker than advertised + load-bearing empty-notes line

  • File: .github/skills/experimental/powerpoint/tests/test_extract_content_integration.py (L137)
  • Related: .github/skills/experimental/powerpoint/tests/conftest.py (L151)

The roundtrip test asserts severity in ("info", "none"). Reproduction shows validate_deck
returns info (not none), because slide 2 has a notes slide with empty text
("Speaker notes present but empty").

That outcome depends entirely on conftest.py L151:

_ = slide2.notes_slide  # ensure notes part exists

Remove or reorder that line and slide 2 has no notes slide, check_speaker_notes emits a
warning, and the test fails — but the comment gives no hint the roundtrip test depends on it.

Recommendation:

  • If the intent is a genuinely clean deck, give slide 2 real notes text and assert
    severity == "none".
  • At minimum, make the coupling explicit in the comment so a future edit does not silently break
    the roundtrip test.

2. [Medium] set_color silently no-ops when a theme node is missing

  • File: .github/skills/experimental/powerpoint/tests/conftest.py (L98–L108)

The two guards immediately above raise ValueError (missing theme_part at L86–L90, missing
clrScheme at L95–L96), but set_color does the opposite:

node = clr_scheme.find(f"a:{color_name}", ns)
if node is not None:
    ...   # no else — silently skips

dk1/accent1 are the source of truth for EXPECTED_FIXTURE["theme_colors"]
(test file L34–L37). If a future python-pptx default template renames or restructures these
nodes, generation silently produces the wrong theme and the failure surfaces as an opaque
assertion mismatch in the extraction tests rather than a clear error at the point of the bug.

Recommendation: Raise ValueError when node is None, consistent with the surrounding guards.

3. [Medium] Runtime generation couples fixtures to python-pptx's bundled default template

  • Files: .github/skills/experimental/powerpoint/tests/test_extract_content_integration.py
    (L20, L28), .github/skills/experimental/powerpoint/pyproject.toml (L6)

Generating at test time makes the tests depend on python-pptx's bundled default.pptx staying
stable — layout names "Title Slide" / "Title and Content" and the clrScheme structure edited
in _set_theme_colors. The committed binary was self-contained; this trades that for a dependency
on upstream defaults. python-pptx is unpinned in pyproject.toml, so a contributor outside the
uv.lock pin could hit drift.

Recommendation: Add a minimum version constraint (or a short comment noting the tests assume the
bundled template's layout/theme names).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(skills): replace committed PPTX binary fixture with programmatic generation

7 participants