Knowledge Workflow Skills

Auditable Codex workflow for turning videos, audio, subtitles, and transcripts into evidence-grounded knowledge reports.

给使用 Codex / 本地 Agent 的研究型用户，把长视频、音频、字幕和文字稿转成可审计知识资产。

What This Is

This project is not a universal video crawler and not a casual video summarizer. It is a Codex skill package and local workflow that:

checks whether first-hand material exists before analysis,
turns transcripts, subtitles, or transcribable media into structured source artifacts,
separates Source / Inference / Extension in the final report,
writes degraded reports instead of pretending when primary material is unavailable.

Who It Is For

Use it if you:

use Codex or a local coding agent as a research assistant,
analyze long videos, courses, interviews, podcasts, talks, or research material,
need evidence-linked notes, reports, scripts, or knowledge-base inputs,
care about knowing when a workflow is blocked or degraded.

It is not for bypassing CAPTCHA, paywalls, private videos, region locks, account permission barriers, or platform access controls.

Three-Minute Start

First run the local transcript demo. Do not start with platform URLs.

On Windows:

git clone https://github.com/sitabanubanu/codex-knowledge-workflow-skills
cd codex-knowledge-workflow-skills

.\sync_to_codex_skills.ps1 -DryRun
.\sync_to_codex_skills.ps1
.\sync_to_codex_skills.ps1 -VerifyOnly

python .\kw.py demo

Optional local CLI install:

python -m pip install -e .
kw demo

On macOS or Linux:

git clone https://github.com/sitabanubanu/codex-knowledge-workflow-skills
cd codex-knowledge-workflow-skills

./sync_to_codex_skills.sh --dry-run
./sync_to_codex_skills.sh
./sync_to_codex_skills.sh --verify-only

python kw.py demo

After the demo finishes, open:

outputs/knowledge-workflow/demo-transcript/result_index.md

That file tells you the status, whether full analysis was allowed, where the final report is, and what to inspect next.

For a slower but stricter acceptance check:

python .\tests\real_workflow_acceptance.py

For v0.5.0-style realistic offline samples:

python .\kw.py batch `
  --input .\examples\real_world\batch_links.csv `
  --output-root .\outputs\knowledge-workflow\real-world-batch

Track the result against docs/real-world-validation-log.md and docs/output-quality-standard.md.

Why The Demo Comes First

Platform URLs can fail because of missing subtitles, login state, bot checks, cookies, region rules, player changes, or network conditions. The demo uses a local transcript, so it proves the core workflow first:

local transcript
  -> source gate
  -> transcript normalization
  -> segmentation
  -> inventory
  -> source logic
  -> evidence audit
  -> video_analysis_pack.md
  -> document planning
  -> quality_gate.json
  -> final_report.md
  -> result_index.md

Product Modes

Mode	Use When	Primary Requirement	Allowed Output
`quick`	Low-cost first look.	Metadata or visible context.	Non-primary triage only.
`standard`	Video decomposition.	Transcript, subtitles, or ASR media.	Pack when source gates allow.
`audit`	Final report or asset.	Source gate + evidence audit.	Final report when approved.

Unified CLI

kw.py is a thin product wrapper around the existing scripts. It does not replace the three skills; it makes the first-run path easier.

python .\kw.py doctor
python .\kw.py demo
python .\kw.py preflight --input .\examples\demo_transcript\input.txt --mode audit
python .\kw.py run --input .\examples\demo_transcript\input.txt --mode audit --language en --final-language en
python .\kw.py status --project-root .\outputs\knowledge-workflow\demo-transcript
python .\kw.py result --project-root .\outputs\knowledge-workflow\demo-transcript
python .\kw.py export --project-root .\outputs\knowledge-workflow\demo-transcript --format md
python .\kw.py quality --project-root .\outputs\knowledge-workflow\demo-transcript
python .\kw.py template --project-root .\outputs\knowledge-workflow\demo-transcript --template research_brief
python .\kw.py batch `
  --input .\examples\batch_research\batch_links.csv `
  --output-root .\outputs\knowledge-workflow\batch-demo

doctor prints a short route-readiness summary by default. Use --pretty for full JSON diagnostics or --output-md doctor.md for a Markdown report.

For Codex usage, you can still ask the agent directly:

Use knowledge-workflow-console for this input.
Run preflight first.
If first-hand material is available, create the video analysis pack and final report.
If primary material is unavailable, do not write a complete analysis.
Write the degraded status and tell me what material is needed next.

Supported Inputs

Input	Stability	Notes
Local transcript (`.txt`, `.md`, `.jsonl`, `.json`)	High	Best first-run path.
Local subtitles (`.srt`, `.vtt`)	High	Preserves timestamped source spans when available.
Local audio/video	Medium-high	Requires ASR dependencies for real transcription.
YouTube public URL	Medium-high	Best effort when subtitles or audio are available.
X / Xiaohongshu / Douyin URLs	Low to medium	Often blocked or degraded.
Private or gated pages	Not a bypass target	Records blocked/degraded status only.

What Success Produces

outputs/knowledge-workflow/<project>/
  result_index.md
  logs/
    preflight.json
    run_state.json
    status_summary.json
    result_index.json
  10_video/
    00_source/source_status.json
    01_transcript/clean_transcript.jsonl
    05_gap_check/evidence_audit.json
    video_analysis_pack.md
  20_document/
    claim_map.json
    quality_gate.json
    final_report.md
  30_final/

Start with result_index.md. It is the user-facing entry point for every run.

What Happens When It Fails

The workflow should not fake a complete report. If it cannot get first-hand material, it writes a degraded or blocked result that explains:

the source status,
whether full analysis is allowed,
which route failed,
what you can provide next: transcript, subtitles, local audio/video, or an authorized cookies file.

Skill Package

The released package contains three skills:

knowledge-workflow-console: route selection, preflight, end-to-end runner, status summaries, result index.
knowledge-video-decomposer: source gates, acquisition checks, transcript normalization, ASR, segmentation, inventory, source logic, evidence audit, video analysis pack.
knowledge-document-composer: document planning, Source / Inference / Extension separation, final report writer, quality gate.

subagent-supervisor is not part of this release package. It may be used locally as an optional coordination layer only when explicitly requested.

Direct Script Entrypoints

The CLI wraps these scripts, but advanced users can still call them directly:

python .\skills\knowledge-video-decomposer\scripts\doctor.py --self-test
python .\skills\knowledge-workflow-console\scripts\workflow_preflight.py --self-test
python .\skills\knowledge-workflow-console\scripts\end_to_end_runner.py --self-test
python .\skills\knowledge-workflow-console\scripts\workflow_status_summary.py --self-test
python .\skills\knowledge-workflow-console\scripts\result_index_writer.py --self-test
python .\skills\knowledge-document-composer\scripts\final_report_writer.py --self-test

Tests

Default tests are offline and fixture-based:

python .\tests\knowledge_workflow_regression.py
python .\tests\live_platform_smoke.py
python .\tests\asr_integration.py
python .\tests\real_workflow_acceptance.py

Optional live platform and real ASR tests require explicit environment variables and user-provided samples:

$env:KW_LIVE_PLATFORM_SMOKE='1'
$env:KW_REAL_ASR_SMOKE='1'

Current Status

Beta. The local transcript/subtitle path is the strongest route. Local media ASR is usable when dependencies are installed. Platform URL handling is intentionally conservative and may stop at degraded status when first-hand material is unavailable.

Current product entry work includes quickstart, examples, result indexing, unified CLI, security/privacy docs, batch research, output templates, Chrome probe normalization, validation matrices, real-world offline examples, failure-path checks, and output quality standards.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
golden_samples/source_gate_demo		golden_samples/source_gate_demo
kw_cli		kw_cli
skills		skills
templates		templates
tests		tests
validation		validation
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
RELEASE_NOTES.md		RELEASE_NOTES.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SUPPORTED_PLATFORMS.md		SUPPORTED_PLATFORMS.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
USER_MANUAL.md		USER_MANUAL.md
kw.py		kw.py
pyproject.toml		pyproject.toml
quality_rubric.md		quality_rubric.md
requirements.txt		requirements.txt
sync_to_codex_skills.ps1		sync_to_codex_skills.ps1
sync_to_codex_skills.sh		sync_to_codex_skills.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Workflow Skills

What This Is

Who It Is For

Three-Minute Start

Why The Demo Comes First

Product Modes

Unified CLI

Supported Inputs

What Success Produces

What Happens When It Fails

Skill Package

Direct Script Entrypoints

Tests

Current Status

More Documentation

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Knowledge Workflow Skills

What This Is

Who It Is For

Three-Minute Start

Why The Demo Comes First

Product Modes

Unified CLI

Supported Inputs

What Success Produces

What Happens When It Fails

Skill Package

Direct Script Entrypoints

Tests

Current Status

More Documentation

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages