Persona Vectors

Extract persona vectors from language models, then probe, project, or steer with them.

A persona vector is the mean hidden-state activation a model produces while answering as a given persona. Extraction saves one (num_layers, hidden_size) tensor per persona, prompt variant, model, and mask strategy; everything downstream reads those tensors back.

personas + QA pairs -> prompts -> token masks -> hidden states -> saved vectors -> analysis

Install

uv sync
cp .env.example .env

Requires Python >=3.12. Set NDIF_API_KEY in .env for remote extraction.

Dataset loading comes from persona-data; the

Quickstart

# Extract — one (num_layers, hidden_size) vector per persona/variant/mask
uv run python main.py extract --model google/gemma-2-9b-it --backend remote

# Analyze — normalized PCA/UMAP, similarity, clustering, scree plots
uv run python main.py analyze --model google/gemma-2-9b-it --variant biography

# Probe — linear probes per persona attribute
uv run python main.py probe --model google/gemma-2-9b-it --variant templated

# Steer — biography minus templated direction
uv run python main.py steer --model google/gemma-2-9b-it --persona-id <UUID> --layer 20

# Push extracted vectors to the Hub
uv run python main.py push --model google/gemma-2-9b-it --repo implicit-personalization/synth-persona-vectors

Notebooks under notebooks/ cover the same flows interactively.

Extraction scripts

# Steering: train split, push to Hub, refresh the dataset card
MODEL=google/gemma-2-9b-it scripts/extraction_train_split.sh

# All-questions (explicit only): first 100 personas under artifacts/persona-vectors/,
# then push and refresh the dataset card
MODEL=google/gemma-2-9b-it scripts/extraction_all_questions.sh

What gets saved

artifacts/activations/<model_dir>/<mask_strategy>/<prompt_variant>/
├── manifest.json
└── <persona_id>.safetensors

<model_dir> is the HF id with / → __. Each safetensors file holds one activations tensor — the persona vector for that variant, averaged across QA pairs and selected tokens. scripts/extraction_all_questions.sh writes under artifacts/persona-vectors/ to keep all-questions runs separate from train-split runs; pass --activations-dir artifacts/persona-vectors to read it back. See the artifacts docs for the full layout.

Layout

src/persona_vectors/
├── activations.py   # low-level hidden-state extraction
├── extraction.py    # prompt formatting, masks, persona extraction flow
├── preview.py       # token-mask preview for --verbose
├── artifacts.py     # PersonaVectorStore (local) + HFPersonaVectorStore (Hub)
├── hub.py           # push to / discover Hub vector datasets
├── analysis.py      # aligned dataset loading, PCA, cosine similarity, clustering
├── plots/           # Plotly figures
├── attributes.py    # attribute schema + color helpers for plots
├── probes.py        # linear probes over saved persona vectors
├── steering.py      # experimental steering vectors
└── parser.py        # CLI parser

See the docs for API details.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
scripts		scripts
src/persona_vectors		src/persona_vectors
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persona Vectors

Install

Quickstart

Extraction scripts

What gets saved

Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Persona Vectors

Install

Quickstart

Extraction scripts

What gets saved

Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages