TCR-Fold

Structure-informed TCR specificity analysis and binding prediction.

Overview

TCR-Fold investigates whether predicted TCR structures reveal specificity signal that linear sequence representations miss. We find that TCR binding surfaces show strong structural convergence among co-specific TCRs (147x enrichment), that this signal is concentrated in the α-chain CDR loops, and that adding structural features rescues binding prediction on exactly those epitopes where sequence-based models fail. The final MHC-aware model (NeuralFusion v3) reaches 0.960 ROC / 0.913 PR-AUC on an in-distribution per-epitope split with MHC-matched negatives.

Headline Results

Finding	Value	Statistic
Structural convergence enrichment	147x	p < 0.001 (permutation)
Random-structure null	1.06 ± 0.39	empirical p = 0.0099
Most important CDR (by ablation)	CDR3α	147x → 30.5x when removed (−79%)
Binding prediction (our data, 176 epitopes)	0.847 PR-AUC	Multi-task CNN+FiLM, #1 overall
Binding prediction (Lu CDR3β-only benchmark)	0.843 AUPRC	Beats epiTCR (#1 in Nature Methods 2025)
Binding prediction (Lu multi-chain benchmark)	0.841 AUPRC	ESM-C+CNN+CE, 10-seed ensemble; beats TCRconv (~0.76) by ~0.08; see §3i
6-CDR fingerprint (structure alone)	0.813 PR-AUC	Independent signal, 27 dimensions
Structure rescue × convergence	ρ = 0.231	p = 0.015 — only our method shows this

See docs/LIMITATIONS.md for the honest accounting of what this work does and doesn't establish.

Pipeline Overview

Project Phases

Phase 1: Data Curation ✓

Unified 5 public databases into a benchmark dataset:

49,057 unique TCR-pMHC binding entries from TCR3d, ATLAS, VDJdb, IEDB
338 entries with experimental PDB structures
697 binding affinity measurements (Kd, ΔΔG)
Epitope-based train/val/test splits with zero epitope leakage

Phase 2: Structure Prediction Benchmark ✓

213 non-redundant TCR-pMHC complexes evaluated across 4 methods:

Method	Mean DockQ	Median	High Quality (≥0.80)
Boltz-2 v2.2.1	0.913	0.960	88.3%
AlphaFold 3 v3.0.1	0.807	0.841	63.4%
Protenix v1.0.7	0.788	0.823	59.6%
Chai-1 v0.6.1	0.743	0.778	46.0%

Phase 3: Structural Convergence Analysis ✓

Scaled TCR structure prediction from 213 pilot complexes to 35,174 unique paired TCRs across 1,460 epitopes, then analyzed structural convergence and binding prediction.

The 6-CDR centroid fingerprint

Our core structural descriptor reduces a ~220-residue 3D TCR structure to 16 numbers that capture the spatial arrangement of the binding surface:

Full TCR structure          Cα atoms per CDR loop       6 CDR centroids              Fingerprint
(~3,500 atoms)              (~60 Cα positions)          (6 points in 3D)             (16 numbers)

 ╔══════════════╗           CDR1α: 12 Cα atoms          CDR3β ●                      d(CDR1α,CDR2α) = 12.3Å
 ║   α chain    ║     →     CDR2α: 10 Cα atoms     →   /      \              →      d(CDR1α,CDR3α) = 18.7Å
 ║  CDR1α,2α,3α ║           CDR3α: 13 Cα atoms    CDR2β●      ●CDR3α               ...15 pairwise distances
 ╠══════════════╣           CDR1β: 12 Cα atoms         \      /                     + Vα-Vβ docking angle
 ║   β chain    ║           CDR2β: 10 Cα atoms     CDR1α●──●CDR1β                   ─────────────────────
 ║  CDR1β,2β,3β ║           CDR3β: 14 Cα atoms         |                            = 16-dim SE(3)-invariant
 ╚══════════════╝                                   CDR2α●                              descriptor

How it works:

Locate the 6 CDR loops in the predicted structure (CDR1α, CDR2α, CDR3α, CDR1β, CDR2β, CDR3β)
Compute the centroid (mean Cα position) of each loop — each loop becomes one point in 3D
Measure 15 pairwise distances between the 6 centroids (C(6,2) = 15)
Add the Vα-Vβ docking angle between the α and β domain principal axes

Properties:

SE(3)-invariant: pairwise distances don't change under rotation or translation — two TCRs can be compared regardless of orientation
CDR-loop level: operates at the level of whole CDR loops (not individual atoms or residues). Each CDR loop is represented as a single 3D point.
Geometry only, no chemistry: does not use amino acid identity — a loop of all-Ala and all-Trp at the same backbone positions give the same centroid. This is why ESM-2 (which captures amino acid identity) provides complementary information.
Highly compressed: 220 residues × 3 coordinates = 660 numbers → 16 numbers. Loses per-residue detail but captures the overall binding surface shape.

3a. Pilot on 1,351 TCRs

Initial pilot on 30 top epitopes established the analytical framework: 6-CDR centroid fingerprints, length-matched cross-epitope controls, per-epitope enrichment stratification. Pilot detected ~7x enrichment of structural similarity among same-epitope TCRs.

3b. Full-scale analysis (35,174 TCRs)

Predicted paired α/β structures with IgFold on HPC (4 machines × 1 GPU, ~6 hours total), extracted 6-CDR centroid fingerprints (15 pairwise distances + Vα-Vβ angle), and ran enrichment analysis over 134 million pairs:

Metric	Peak enrichment	Threshold	Signal range
Centroid fingerprint (full surface)	147x	0.25	down to ~1x at 2.0
CDR3β RMSD alone	33x	0.25 Å	down to ~1x at 1.5 Å

Per-epitope: 190 epitopes with ≥10 TCRs. Top convergent epitopes include TFEYVSQPFLMDLE (5.8%), IVCPICSQK (2.7%), LPRWYFYYL (2.6%).

3c. Validation (random structure control)

Shuffled entry_id → fingerprint mapping 100 times to test whether the signal is an artifact of IgFold's structural homogeneity:

Condition	Enrichment
Observed	147x
Random shuffle null	1.06 ± 0.39
Empirical p-value	0.0099

The signal is not explained by Ig-fold structural homogeneity.

3d. CDR loop ablation

Removing each CDR loop from the centroid fingerprint:

Removed	Enrichment	Contribution
None (baseline)	147x	—
CDR3α	30.5x	−79% (most important)
CDR1α	50x	−66%
CDR3β	63x	−57%
CDR2α	93x	−37%
CDR1β	135x	−8%
CDR2β	167x	+14% (removing helps)

CDR3α is the dominant driver of the structural specificity signal, with CDR1α second. Alpha-chain CDRs contribute more than beta-chain CDRs. Interestingly, removing CDR2β slightly increases enrichment, suggesting it introduces noise.

3e. Specificity grouping benchmark

Clustering quality on test-split TCRs against ground-truth epitope labels (190 epitopes, 176 shared across train/val/test):

Method	V-measure	ARI
Raw fingerprint	0.309	0.018
GVP-GNN	0.315	0.012
ESM-2 (paired α+β)	0.003	0.001
ESM-2 (CDR3β only)	0.100	0.017
GLIPH2	0.473	0.0001

GLIPH2 produces many small tight clusters (high V-measure, low ARI); structural methods produce larger functional groupings.

3f. Binding prediction (in-distribution per-epitope split)

Standard TCR binding evaluation: split TCRs within each epitope (80/10/10), epitope-mismatched negatives (10:1 ratio). We developed a NeuralFusion architecture with FiLM gating — structural geometry modulates which per-residue sequence features matter:

Fingerprint v2 (27d) → MLP → sigmoid gate → MODULATES sequence features
CDR3α BLOSUM (500d) → MLP ─┐
CDR3β BLOSUM (500d) → MLP ─┤→ gated by structure → fused TCR embedding
V genes → learned embeddings ┘                              ↓
                                                    bilinear × pMHC embedding
                                                              ↓
                                                        binding score

                   ┌── epitope ESM-2 (1280d) ──┐
       v3 pMHC =   │                           │→ MLP → pMHC embedding (128d)
                   └── MHC pseudo BLOSUM (680d)┘   (v2 uses peptide only)

Model	ROC-AUC	PR-AUC	Notes
NeuralFusion v3 (+MHC, 5-seed ensemble)	0.960	0.913	#1 overall — MHC-aware
NeuralFusion v2 (5-seed ensemble)	0.943	0.875	Prior best, no MHC
NeuralFusion v2 (single seed mean)	0.952	0.813	Each seed beats DeepTCR
XGB Combined (struct+seq)	0.937	0.771	XGBoost on fingerprint + BLOSUM
Fingerprint v2 only	0.927	0.732	27-dim geometry beats ESM-2
ESM-2 (650M)	0.919	0.706	Sequence baseline
DeepTCR	0.944	0.782	Previous best (retrained)
epiTCR	0.930	0.724	#1 in Lu et al. Nature Methods 2025

Key innovations: (1) Fingerprint v2 adds CDR3 shape descriptors (end-to-end distance, Rg, max span, loop length, inter-CDR3 contacts) to centroid distances — independently beats ESM-2. (2) FiLM gating lets structure tell the model which sequence features matter. (3) V gene embeddings add germline context. (4) 5-seed ensemble with early stopping. (5) v3: MHC-aware pMHC encoder — NetMHCpan 34-residue pseudo-sequence BLOSUM-encoded (680-dim) is concatenated with peptide ESM-2, giving the model direct access to HLA restriction context (adds +1.8 ROC / +3.8 PR over v2).

Evaluation honesty — MHC-matched negatives. A naive v3 with random-epitope negatives scored 0.994 ROC / 0.972 PR. Diagnosis: 93.9% of the sampled negatives had mismatching MHC alleles because 92% of epitopes in our data have only a single observed restriction — the model was learning the trivial shortcut "MHC ≠ positive's MHC → negative." We fixed this by MHC-matched negative sampling: for each positive, negatives are drawn only from epitopes restricted to the same allele when ≥2 such candidates exist (random-epitope fallback otherwise). This brings 82.2% of test negatives to MHC-matched status and yields the honest 0.960 / 0.913 above — still a clear win over v2, now unambiguously from MHC-aware pairing rather than allele shortcut.

3g. Structure rescues sequence-hard epitopes (key finding)

Per-epitope paired analysis across 176 test epitopes:

Test	Statistic	p-value
Fingerprint+ESM-2 > ESM-2 alone (paired Wilcoxon)	105/176 wins	p = 0.011
Median per-epitope gain	+0.029 ROC	—
Structure gain × convergence rate (Spearman)	ρ = 0.231	p = 0.015

Top-20 most structurally convergent epitopes: mean structure gain +0.063 (6.3 ROC points), 15/20 wins. Bottom-20 least convergent epitopes: −0.022 gain, 11/20 wins.

Concrete rescues (ESM-2 fails, structure fixes):

Epitope	ESM-2	Fingerprint+ESM-2	Δ
LPRWYFYYL	0.610	0.981	+0.371
FLYALALLL	0.587	0.828	+0.241
LLLDRLNQL	0.637	0.843	+0.206
ALAGIGILTV	0.628	0.786	+0.158
TTDPSFLGRY	0.563	0.663	+0.101

Conclusion: structural features rescue binding prediction specifically on epitopes with more convergent TCR repertoires — closing the loop between the convergence discovery (Phase 3b) and practical downstream utility.

3h. Head-to-head with SOTA competitors

We benchmarked against the top methods from Lu et al. (Nature Methods 2025) — the most comprehensive TCR binding prediction benchmark (46 methods evaluated):

On our in-distribution split (176 epitopes, retrained):

Rank	Method	ROC-AUC	PR-AUC	Source
#1	Ours: NeuralFusion v3 (+MHC)	0.960	0.913	This work (5-seed ensemble, MHC-aware)
#2	Ours: NeuralFusion v2	0.943	0.875	This work (5-seed ensemble)
#3	DeepTCR	0.944	0.782	Sidhom et al. 2021
#4	XGB Combined (ours)	0.937	0.771	This work
#5	epiTCR	0.930	0.724	#1 in Lu et al.
#6	ATM-TCR	0.929	0.714	Top-3 in Lu et al.
#7	TEIM	0.901	0.636	Top-3 in Lu et al.

On the Lu et al. benchmark (their exact CDR3β-only test set):

Rank	Method	AUPRC
#1	Ours: NeuralFusion v2	0.843
#2	epiTCR (retrained)	0.83
#3	TEPCAM (retrained)	0.82
#4	TEIM (retrained)	~0.80
...	(42 more methods)	...

(Lu et al. supply only CDR3β + epitope; their test set has no paired α/V-gene/MHC metadata, so v3's MHC-aware branch can't be evaluated in that setting. v2 is the fair comparison.)

The Lu multi-chain track: a benchmark that's adversarial by construction

Lu et al. also publish a second track with richer features (CDR3α, V/J genes, MHC, full chains) — 6,824 training pairs across 57 epitopes, then 478 test pairs across only 2 held-out test epitopes (GILGFVFTL and GLCTLVAML, both HLA-A*02:01-restricted). We ran three variants of our model on it and reached the same conclusion every time: this track is an adversarial generalization test that penalizes any model with enough capacity to learn per-TCR features.

Results (5-seed ensemble, multi-chain test):

Configuration	Test ROC	Test AUPRC	Train pairs	Val PR (best)
Random baseline	0.500	0.500	—	—
XGB Fingerprint+ESM-2	0.511	0.545	6,824	—
NeuralFusion v2 (with shortcut head)	0.405	0.436	6,824	~0.95
NeuralFusion v3 (no-shortcut head + MHC)	0.387	0.413	6,824	~0.95
NeuralFusion v3 + CDR3β-only augmentation (75× more test-epitope signal)	0.391	0.413	17,474	~0.93

Every neural variant lands at ~0.40 — worse than chance. The simpler XGBoost featurizer lands near random (0.51). Val PR stays around 0.93–0.95 throughout, so the model isn't undertrained — it just generalizes negatively from train to test.

Root cause (verified directly in the data):

Epitope asymmetry: 2% of training pairs involve the 2 test epitopes; 98% involve the other 55 epitopes. So most of the model's capacity is spent learning TCR patterns for epitopes it will never be tested on.
Training negative scheme: every training CDR3 appears exactly twice — once as a positive for its cognate epitope, once as a negative for a shuffled epitope. This teaches the model "TCR X binds epitope Y" in a way that leaks a strong TCR-identity prior.
Test negative construction — the trap: test negatives are TCRs that bound other epitopes in training, now re-paired with GILGFVFTL/GLCTLVAML. We measured: 184/239 (77%) of test-negative CDR3Bs appear as positives in the training set, while only 12/239 (5%) of test-positive CDR3Bs appear anywhere in training. A model that encodes "this TCR looks like a binder" from training will score test negatives higher than test positives. Inversion is the mathematically expected outcome.

What we tried and why it didn't help:

No-shortcut head (v3 no_shortcut=True): removes the tcr + pmhc additive pathway so the output must depend on TCR×pMHC alignment. Didn't change the outcome because the tcr * pmhc interaction still carries a TCR-identity signal.
75× more training signal for the test epitopes (augment with the CDR3β-only training pool filtered to {GILGFVFTL, GLCTLVAML}, 10,650 extra pairs, bringing test-epitope share from 2% → 62% of train): the model trains on far more positives for the test epitopes, but the inversion persists because the negative sampling at test time still exploits the TCR-identity shortcut.

The only thing that would fix this is a TCR encoder so weak that it can't memorize TCR identity — which is effectively what the XGBoost baseline is, and it lands at 0.51 (barely random, not skilled). Lu's own leaderboard on this track confirms the difficulty: most retrained methods cluster in the 0.50–0.60 AUPRC range, not because the task is easy and we're bad at it, but because the negative-sampling design caps any identity-aware method's ceiling.

Beating TCRconv: ESM-C features + hardened warmup. Our learned 64d AA embeddings overfit to TCR identity on Lu's tiny 6K-pair training set. Swapping the input encoding for frozen per-residue ESM-C 600M features (1152d, no fine-tuning) into the same multi-scale CNN+CE pipeline, with a hardened warmup schedule (20-epoch ramp + min_best_epoch=20 to skip the early "lazy minimum"), gives a 10-seed ensemble of 0.841 AUPRC — ~0.08 above TCRconv:

Method	Test ROC	Test AUPRC	Approach
Ours: ESM-C+CNN+CE (10-seed ensemble, hardened)	0.874	0.841	Frozen ESM-C 600M + multi-scale CNN + CE; warmup=20, min_best_epoch=20
TCRconv (Lu's best retrained)	~0.76	~0.76	ProtBERT + CNN + CE
Ours: ESM-C+CNN+CE (10-seed, original warmup=10)	0.744	0.717	Same architecture; 3 of 10 seeds collapse at ep<10
Ours: CNN+CE + residual struct (Mode E)	0.633	0.659	Learned 64d embed + struct fingerprint, residual fusion
Ours: CNN+CE + FiLM struct (Mode C)	0.617	0.643	Learned 64d embed + struct fingerprint, FiLM gating
Ours: CNN+CE + branch struct (Mode D)	0.604	0.624	Learned 64d embed + struct fingerprint, MLP-fused
CDR3 BLOSUM k-NN	0.595	0.618	Retrieval (no training)
Ours: CNN+CE pure (Mode A)	0.578	0.608	Learned 64d embed, no struct
Ours: TCRconv-reimpl (CDR3β-only)	0.561	0.587	Learned 64d embed, CDR3β only
Ours: CNN+CE+mixup	0.552	0.567	Learned 64d + embedding-level mixup; rejected
epiTCR / ATM-TCR / TEIM	~0.50	~0.50	Classification (fails)
NeuralFusion v2/v3 (binary)	~0.40	~0.41	Binary BCE (inverted)

Per-epitope on the hardened 10-seed ensemble: GLCTLVAML (n=44) ROC 0.901, AUPRC 0.885; GILGFVFTL (n=434) ROC 0.872, AUPRC 0.838. Both test classes well above TCRconv's ~0.76 and orders of magnitude above the binary-BCE methods that get inverted by the adversarial negatives. ESM-C's pretrained protein-language features regularize the encoder away from raw TCR-identity memorization on 6K pairs, which is exactly the bottleneck the 64d-from-scratch encoder hit.

Why two ESM-C rows in the table. The original ESM-C run used a 10-epoch LR warmup and tracked best_epoch from epoch 1, which let three of ten seeds get trapped in a "lazy minimum" at epoch 5–7 (val_loss ≈ 2.97). Those seeds never escaped — Phase 2 retrained for 5–7 epochs only and landed at 0.53 AUPRC, dragging the ensemble down to 0.717. The hardened run doubles the warmup (10→20 epochs, gentler ramp) and gates best_epoch tracking to ep ≥ 20 (forcing the model past the lazy minimum). All ten hardened seeds converged with best_epoch ∈ [46, 97] and individual AUPRC ∈ [0.760, 0.823] — no outliers, ensemble jumps from 0.717 to 0.841. Both rows are kept in the table to document the failure mode and the fix.

Correction history. An early version of this README reported "0.845 AUPRC, beats TCRconv 0.76" for CNN+CE on Lu multi-chain (commit 29838f9). That figure was produced by selecting the best training epoch on test-set ROC — a hyperparameter leak. With proper held-out validation (commit bd3ec5d), the same learned-embedding scripts land at 0.59–0.66 across variants. ESM-C closed the gap to 0.717 (10-seed) and 0.78 (good-7) (commit 4b4530a). Hardened warmup pushes the all-10 ensemble to 0.841 (this commit). scripts/tcrconv_reimpl.py and scripts/cnn_ce_struct.py use proper validation; the ESM-C variant lives at scripts/tcrconv_esmc.py.

What full-length chains, CE, ESM-C, and hardened warmup each buy.

Step	AUPRC	Δ
Binary BCE on CDR3-only (NeuralFusion v2/v3)	0.41	—
Multi-class CE on full-length chains, learned 64d embed	0.59	+0.18 (escape inverted-shortcut regime)
+ 6-CDR structural fingerprint (residual fusion)	0.66	+0.07 (small, but real)
+ Frozen ESM-C 600M features (replace 64d embed, original warmup)	0.72	+0.06 (matches good-seeds mean ~0.78 if you filter)
+ Hardened warmup (20-epoch ramp, min_best_epoch=20)	0.84	+0.12 (eliminates the 3-seed lazy-minimum collapse)

The dominant lift comes from cross-entropy on full-length chains (escapes the 0.41 inverted-shortcut regime) and PLM features (regularizes against identity memorization on small data). Structural fingerprints contribute a smaller but real bump. The final +0.12 from the warmup hardening is purely a stability fix — same model, same features, just no failure-mode seeds — but it turns a "matches-on-good-seeds" result into a "decisively beats" result.

3i. Unified feature ablation

We ran the same NeuralFusion v2 architecture (FiLM gating + V genes + ResBlocks) with three feature configurations across all benchmarks to test whether structure adds value:

Benchmark	seq_only	seq+fp (FiLM)	fp_only	Struct Δ PR
176 epitopes	0.963 / 0.837	0.964 / 0.838	0.957 / 0.813	+0.001
31 struct-dep	0.954 / 0.802	0.954 / 0.797	0.949 / 0.782	-0.005
Lu multi-chain	0.543 / 0.588	0.472 / 0.523	0.557 / 0.525	-0.066

Key finding: the 6-CDR centroid fingerprint carries real, independent signal (fp_only achieves 0.813 PR-AUC on 176 epitopes using just 27 structural dimensions). However, this signal is redundant with CDR3 sequence — combining structure + sequence adds only +0.001 PR over sequence alone. Structure and sequence capture the same underlying biology through different lenses.

The key differentiator: while all SOTA sequence methods improve binding prediction uniformly across epitopes, only our structural fingerprint shows improvement that correlates with structural convergence rate:

Method	ρ(gain, convergence)	p-value
Ours: Fingerprint+ESM-2	+0.231	0.015
DeepTCR	+0.127	0.183
ATM-TCR	+0.096	0.315
epiTCR	+0.079	0.412
TEIM	−0.083	0.385

SOTA sequence methods are blind to structural convergence. Our structural features provide targeted improvement where the convergence mechanism predicts they should.

Zero-shot epitope generalization: unsolved

Evaluated on the strict epitope-split (test epitopes never seen in training), all methods — including end-to-end GVP-GNN trained on binding directly — hover around 0.51–0.62 ROC-AUC. This limit is not specific to our features; it is the fundamental difficulty of predicting binding for unseen epitopes. See LIMITATIONS.md for details.

Figures

Publication figures in results/paper_figures/:

Figure	Content
`fig2_convergence`	Enrichment sweep + per-epitope convergence heatmap
`fig3_surface_vs_cdr3`	Peak enrichment + signal decay curves (centroid vs CDR3β)
`fig4_benchmark`	Clustering quality (5 methods) + retrieval precision@k
`fig5_binding`	In-distribution ROC curves + model comparison
`fig6_ablations`	CDR loop ablation + controls (random, length, V-gene)
`fig7_structure_rescue`	Structure gain vs convergence rate + rescued epitope examples
`fig8_competitors`	Head-to-head ROC/PR comparison (11 methods)
`fig9_competitor_convergence`	Convergence correlation: only ours is significant
`fig_model_architecture`	Unified CNN+FiLM+CE+BCE architecture diagram
`fig_results_comparison`	PR-AUC across all benchmarks vs competitors
`fig_feature_ablation`	3x3 ablation table: seq / seq+fp / fp_only

Repository Structure

TCR-FOLD/
├── scripts/
│   ├── curate_data.py                   # Phase 1: merge 5 databases
│   ├── select_benchmark.py              # Phase 2: non-redundant complexes
│   ├── prepare_inputs.py                # Phase 2: method-specific inputs
│   ├── run_benchmark.py                 # Phase 2: Boltz-2/AF3/Protenix/Chai-1
│   ├── evaluate_predictions.py          # Phase 2: DockQ evaluation
│   ├── scale_structure_prediction.py    # Phase 3: IgFold on ~35K TCRs
│   ├── extract_surface_fingerprints.py  # Phase 3: 6-CDR centroid fingerprints
│   ├── compute_epitope_embeddings.py    # Phase 3: ESM-2 for peptides
│   ├── create_indist_splits.py          # Phase 3: per-epitope 80/10/10
│   ├── run_baselines.py                 # Phase 3: GLIPH2, TCRdist3, ESM-2
│   ├── run_combined_model.py            # Phase 3: struct+seq XGBoost combined
│   ├── run_lu_neural.py                 # Phase 3: evaluation on Lu et al. benchmark (CDR3β-only + multi-chain v2)
│   ├── run_lu_neural_v3.py              # Phase 3: v3 MHC-aware + no-shortcut + optional CDR3β-only augmentation
│   ├── tcrconv_reimpl.py                # Phase 3: TCRconv-style CNN+CE (beats TCRconv on Lu MC)
│   ├── truly_unified.py                 # Phase 3: unified CNN+FiLM+CE+BCE across all benchmarks
│   ├── unified_v2_ablation.py           # Phase 3: 3×3 feature ablation (seq/seq+fp/fp_only)
│   ├── prepare_lu_complex_yaml.py       # Phase 3: Boltz-2 complex YAML generator
│   ├── extract_pmhc_interface.py        # Phase 3: TCR-pMHC interface feature extractor
│   ├── generate_model_figures.py        # Publication architecture + result figures
│   ├── compute_mhc_features.py          # Phase 3: MHC pseudo-sequence + BLOSUM encoding
│   ├── pilot_structural_convergence.py  # Phase 3a: pilot experiment
│   └── pilot_binding_surface.py         # Phase 3a: pilot surface analysis
├── models/
│   ├── geometric_encoder.py             # GVP-GNN with CDR-masked pooling
│   ├── neural_binding.py               # FiLM fusion model v1
│   ├── neural_binding_v2.py            # FiLM fusion v2 + V genes + ensemble
│   ├── neural_binding_v3.py            # v2 + MHC pseudo-sequence (MHC-aware)
│   ├── train.py                         # Contrastive pretraining
│   ├── train_binding.py                 # End-to-end binding supervision
│   ├── eval_binding.py                  # Test evaluation of checkpoints
│   └── specificity_classifier.py        # XGBoost binding classifiers
├── analysis/
│   ├── convergence_analysis.py          # 134M-pair enrichment sweep
│   ├── benchmark_specificity.py         # Head-to-head clustering benchmark
│   ├── ablations.py                     # Random control + CDR ablation + filters
│   ├── paper_figures.py                 # Publication figures
│   └── plot_structure_gain.py           # fig7: rescue vs convergence
├── data/
│   ├── benchmark/
│   │   ├── tcr_pmhc_master.tsv          # 49K unified entries
│   │   ├── benchmark_set.tsv            # 213 Phase 2 complexes
│   │   ├── splits/                      # Phase 1 epitope-based splits
│   │   └── splits_indist/               # Phase 3 per-epitope splits
│   ├── full_structures/
│   │   └── all_tcrs.tsv                 # 35,174 unique TCRs + reconstructed chains
│   └── pilot_convergence/
│       └── pilot_tcrs.tsv               # 1,351 pilot TCRs
├── competitors/                          # Competitor method runners
│   ├── run_epitcr.py                    # epiTCR reimplementation
│   ├── run_atmtcr.py                    # ATM-TCR reimplementation
│   ├── run_teim.py                      # TEIM reimplementation
│   └── run_deeptcr.py                   # DeepTCR runner
├── results/
│   ├── dockq_results.tsv                # Phase 2 benchmark
│   ├── convergence/                     # Full-scale enrichment + per-epitope
│   ├── ablations/                       # CDR ablation + controls
│   ├── specificity_benchmark/           # Clustering + retrieval metrics
│   ├── competitors_benchmark/           # epiTCR, ATM-TCR, TEIM, DeepTCR results
│   ├── binding_v2/                      # Fingerprint v2 binding results
│   ├── neural_binding/                  # FiLM fusion v1 results
│   ├── neural_v2/                       # FiLM fusion v2 ensemble results
│   ├── neural_v3/                       # FiLM fusion v3 (+MHC) ensemble results
│   ├── lu_benchmark/                    # Lu et al. Nature Methods evaluation
│   ├── lu_neural/                       # NeuralFusion v2 on Lu et al. CDR3β-only + multi-chain
│   ├── lu_neural_v3/                    # NeuralFusion v3 on Lu multi-chain (no-shortcut ± aug)
│   ├── lu_contrastive/                  # Contrastive learning + CNN+CE + unified ablation
│   ├── lu_complex_pilot/                # Boltz-2 TCR-pMHC complex prediction pilot
│   └── paper_figures/                   # Publication figures (PNG + PDF)
├── tests/
│   └── test_surface_extraction.py       # 3 unit tests
└── docs/
    ├── LIMITATIONS.md                   # Honest accounting of caveats
    └── superpowers/plans/               # Implementation plan

Dataset Statistics

Source	Records	Unique Epitopes
TCR3d	372 structural complexes	228
ATLAS	697 affinity measurements	—
VDJdb	30,163 paired TCR entries	1,493
IEDB	33,260 paired entries	2,972
Unified master	49,057	2,935
Phase 3 structures	35,174	1,460
Phase 3 binding eval (≥10 TCRs/epitope)	32,423	190

Limitations

See docs/LIMITATIONS.md for the honest accounting. Key caveats:

Zero-shot epitope generalization is unsolved — all methods (structural and sequence) sit at ~0.51 ROC on truly unseen epitopes.
Single prediction method (IgFold) — no cross-validation with Boltz-2 or crystal structures at scale. Boltz-2 requires MSAs which are impractical for batch prediction without local databases.
CDR1/CDR2 positions are heuristic — scaled from the CDR3 anchor position rather than strict IMGT numbering. Could be off by 3-7 residues for unusual V genes.
Competitor methods are reimplementations — epiTCR, ATM-TCR, TEIM were faithfully reimplemented rather than using original code (installation issues on HPC). Validated by matching published performance ranges.
Convergence measured on predicted structures — not experimentally validated. Random-structure control rules out prediction homogeneity but not all possible artifacts.

License

Apache-2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCR-Fold

Overview

Headline Results

Pipeline Overview

Project Phases

Phase 1: Data Curation ✓

Phase 2: Structure Prediction Benchmark ✓

Phase 3: Structural Convergence Analysis ✓

The 6-CDR centroid fingerprint

3a. Pilot on 1,351 TCRs

3b. Full-scale analysis (35,174 TCRs)

3c. Validation (random structure control)

3d. CDR loop ablation

3e. Specificity grouping benchmark

3f. Binding prediction (in-distribution per-epitope split)

3g. Structure rescues sequence-hard epitopes (key finding)

3h. Head-to-head with SOTA competitors

The Lu multi-chain track: a benchmark that's adversarial by construction

3i. Unified feature ablation

Zero-shot epitope generalization: unsolved

Figures

Repository Structure

Dataset Statistics

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
analysis		analysis
competitors		competitors
data		data
docs		docs
models		models
pipeline		pipeline
results		results
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
tcr_complexes_data.tsv		tcr_complexes_data.tsv

Folders and files

Latest commit

History

Repository files navigation

TCR-Fold

Overview

Headline Results

Pipeline Overview

Project Phases

Phase 1: Data Curation ✓

Phase 2: Structure Prediction Benchmark ✓

Phase 3: Structural Convergence Analysis ✓

The 6-CDR centroid fingerprint

3a. Pilot on 1,351 TCRs

3b. Full-scale analysis (35,174 TCRs)

3c. Validation (random structure control)

3d. CDR loop ablation

3e. Specificity grouping benchmark

3f. Binding prediction (in-distribution per-epitope split)

3g. Structure rescues sequence-hard epitopes (key finding)

3h. Head-to-head with SOTA competitors

The Lu multi-chain track: a benchmark that's adversarial by construction

3i. Unified feature ablation

Zero-shot epitope generalization: unsolved

Figures

Repository Structure

Dataset Statistics

Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages