scoring-framework

Here are 3 public repositories matching this topic...

kajogo777 / the-agent-sandbox-taxonomy

An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions, 27 "sandboxes" scored.

security devops taxonomy sandbox threat-modeling ai-agents container-security microvm defense-in-depth infrastructure-security llm-agents agent-safety scoring-framework compute-isolation

Updated Jun 10, 2026
Go

syed-waleed-ahmed / LLM-as-Judge

Star

A Streamlit web app that uses a Groq-powered LLM (Llama 3) to act as an impartial judge for evaluating and comparing two model outputs. Supports custom criteria, presets like creativity and brand tone, and returns structured scores, explanations, and a winner. Built end-to-end with Python, Groq API, and Streamlit.

python code-evaluation a-b-testing text-evaluation groq streamlit model-benchmarking ai-automation ai-evaluation llm prompt-evaluation llama3 llm-judge output-evaluation scoring-framework

Updated Jul 4, 2026
HTML

ahmedanees-m / pen-score

Star

Multi-axis scoring framework that ranks programmable genome editors across eight orthogonal axes into a single PenScore to guide experimental design and benchmarking.

machine-learning bioinformatics computational-biology crispr crispr-cas9 genome-editing genetic-engineering gene-therapy base-editors recombinases scoring-framework transposase genome-atlas mech-class prime-editors therapeutic-genome-editing pen-assemble

Updated Jul 5, 2026
Python

Improve this page

Add a description, image, and links to the scoring-framework topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scoring-framework topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly