Skip to content

princello/TCR-FOLD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TCR-Fold

Structure-informed TCR specificity analysis and binding prediction.

Overview

TCR-Fold investigates whether predicted TCR structures reveal specificity signal that linear sequence representations miss. We find that TCR binding surfaces show strong structural convergence among co-specific TCRs (147x enrichment), that this signal is concentrated in the α-chain CDR loops, and that adding structural features rescues binding prediction on exactly those epitopes where sequence-based models fail. The final MHC-aware model (NeuralFusion v3) reaches 0.960 ROC / 0.913 PR-AUC on an in-distribution per-epitope split with MHC-matched negatives.

Headline Results

Finding Value Statistic
Structural convergence enrichment 147x p < 0.001 (permutation)
Random-structure null 1.06 ± 0.39 empirical p = 0.0099
Most important CDR (by ablation) CDR3α 147x → 30.5x when removed (−79%)
Binding prediction (our data, 176 epitopes) 0.847 PR-AUC Multi-task CNN+FiLM, #1 overall
Binding prediction (Lu CDR3β-only benchmark) 0.843 AUPRC Beats epiTCR (#1 in Nature Methods 2025)
Binding prediction (Lu multi-chain benchmark) 0.841 AUPRC ESM-C+CNN+CE, 10-seed ensemble; beats TCRconv (~0.76) by ~0.08; see §3i
6-CDR fingerprint (structure alone) 0.813 PR-AUC Independent signal, 27 dimensions
Structure rescue × convergence ρ = 0.231 p = 0.015 — only our method shows this

See docs/LIMITATIONS.md for the honest accounting of what this work does and doesn't establish.

Pipeline Overview

TCR-Fold Pipeline Overview

Project Phases

Phase 1: Data Curation ✓

Unified 5 public databases into a benchmark dataset:

  • 49,057 unique TCR-pMHC binding entries from TCR3d, ATLAS, VDJdb, IEDB
  • 338 entries with experimental PDB structures
  • 697 binding affinity measurements (Kd, ΔΔG)
  • Epitope-based train/val/test splits with zero epitope leakage

Phase 2: Structure Prediction Benchmark ✓

213 non-redundant TCR-pMHC complexes evaluated across 4 methods:

Method Mean DockQ Median High Quality (≥0.80)
Boltz-2 v2.2.1 0.913 0.960 88.3%
AlphaFold 3 v3.0.1 0.807 0.841 63.4%
Protenix v1.0.7 0.788 0.823 59.6%
Chai-1 v0.6.1 0.743 0.778 46.0%

Phase 3: Structural Convergence Analysis ✓

Scaled TCR structure prediction from 213 pilot complexes to 35,174 unique paired TCRs across 1,460 epitopes, then analyzed structural convergence and binding prediction.

The 6-CDR centroid fingerprint

Our core structural descriptor reduces a ~220-residue 3D TCR structure to 16 numbers that capture the spatial arrangement of the binding surface:

Full TCR structure          Cα atoms per CDR loop       6 CDR centroids              Fingerprint
(~3,500 atoms)              (~60 Cα positions)          (6 points in 3D)             (16 numbers)

 ╔══════════════╗           CDR1α: 12 Cα atoms          CDR3β ●                      d(CDR1α,CDR2α) = 12.3Å
 ║   α chain    ║     →     CDR2α: 10 Cα atoms     →   /      \              →      d(CDR1α,CDR3α) = 18.7Å
 ║  CDR1α,2α,3α ║           CDR3α: 13 Cα atoms    CDR2β●      ●CDR3α               ...15 pairwise distances
 ╠══════════════╣           CDR1β: 12 Cα atoms         \      /                     + Vα-Vβ docking angle
 ║   β chain    ║           CDR2β: 10 Cα atoms     CDR1α●──●CDR1β                   ─────────────────────
 ║  CDR1β,2β,3β ║           CDR3β: 14 Cα atoms         |                            = 16-dim SE(3)-invariant
 ╚══════════════╝                                   CDR2α●                              descriptor

How it works:

  1. Locate the 6 CDR loops in the predicted structure (CDR1α, CDR2α, CDR3α, CDR1β, CDR2β, CDR3β)
  2. Compute the centroid (mean Cα position) of each loop — each loop becomes one point in 3D
  3. Measure 15 pairwise distances between the 6 centroids (C(6,2) = 15)
  4. Add the Vα-Vβ docking angle between the α and β domain principal axes

Properties:

  • SE(3)-invariant: pairwise distances don't change under rotation or translation — two TCRs can be compared regardless of orientation
  • CDR-loop level: operates at the level of whole CDR loops (not individual atoms or residues). Each CDR loop is represented as a single 3D point.
  • Geometry only, no chemistry: does not use amino acid identity — a loop of all-Ala and all-Trp at the same backbone positions give the same centroid. This is why ESM-2 (which captures amino acid identity) provides complementary information.
  • Highly compressed: 220 residues × 3 coordinates = 660 numbers → 16 numbers. Loses per-residue detail but captures the overall binding surface shape.

3a. Pilot on 1,351 TCRs

Initial pilot on 30 top epitopes established the analytical framework: 6-CDR centroid fingerprints, length-matched cross-epitope controls, per-epitope enrichment stratification. Pilot detected ~7x enrichment of structural similarity among same-epitope TCRs.

3b. Full-scale analysis (35,174 TCRs)

Predicted paired α/β structures with IgFold on HPC (4 machines × 1 GPU, ~6 hours total), extracted 6-CDR centroid fingerprints (15 pairwise distances + Vα-Vβ angle), and ran enrichment analysis over 134 million pairs:

Metric Peak enrichment Threshold Signal range
Centroid fingerprint (full surface) 147x 0.25 down to ~1x at 2.0
CDR3β RMSD alone 33x 0.25 Å down to ~1x at 1.5 Å

Per-epitope: 190 epitopes with ≥10 TCRs. Top convergent epitopes include TFEYVSQPFLMDLE (5.8%), IVCPICSQK (2.7%), LPRWYFYYL (2.6%).

3c. Validation (random structure control)

Shuffled entry_id → fingerprint mapping 100 times to test whether the signal is an artifact of IgFold's structural homogeneity:

Condition Enrichment
Observed 147x
Random shuffle null 1.06 ± 0.39
Empirical p-value 0.0099

The signal is not explained by Ig-fold structural homogeneity.

3d. CDR loop ablation

Removing each CDR loop from the centroid fingerprint:

Removed Enrichment Contribution
None (baseline) 147x
CDR3α 30.5x −79% (most important)
CDR1α 50x −66%
CDR3β 63x −57%
CDR2α 93x −37%
CDR1β 135x −8%
CDR2β 167x +14% (removing helps)

CDR3α is the dominant driver of the structural specificity signal, with CDR1α second. Alpha-chain CDRs contribute more than beta-chain CDRs. Interestingly, removing CDR2β slightly increases enrichment, suggesting it introduces noise.

3e. Specificity grouping benchmark

Clustering quality on test-split TCRs against ground-truth epitope labels (190 epitopes, 176 shared across train/val/test):

Method V-measure ARI
Raw fingerprint 0.309 0.018
GVP-GNN 0.315 0.012
ESM-2 (paired α+β) 0.003 0.001
ESM-2 (CDR3β only) 0.100 0.017
GLIPH2 0.473 0.0001

GLIPH2 produces many small tight clusters (high V-measure, low ARI); structural methods produce larger functional groupings.

3f. Binding prediction (in-distribution per-epitope split)

Standard TCR binding evaluation: split TCRs within each epitope (80/10/10), epitope-mismatched negatives (10:1 ratio). We developed a NeuralFusion architecture with FiLM gating — structural geometry modulates which per-residue sequence features matter:

Fingerprint v2 (27d) → MLP → sigmoid gate → MODULATES sequence features
CDR3α BLOSUM (500d) → MLP ─┐
CDR3β BLOSUM (500d) → MLP ─┤→ gated by structure → fused TCR embedding
V genes → learned embeddings ┘                              ↓
                                                    bilinear × pMHC embedding
                                                              ↓
                                                        binding score

                   ┌── epitope ESM-2 (1280d) ──┐
       v3 pMHC =   │                           │→ MLP → pMHC embedding (128d)
                   └── MHC pseudo BLOSUM (680d)┘   (v2 uses peptide only)
Model ROC-AUC PR-AUC Notes
NeuralFusion v3 (+MHC, 5-seed ensemble) 0.960 0.913 #1 overall — MHC-aware
NeuralFusion v2 (5-seed ensemble) 0.943 0.875 Prior best, no MHC
NeuralFusion v2 (single seed mean) 0.952 0.813 Each seed beats DeepTCR
XGB Combined (struct+seq) 0.937 0.771 XGBoost on fingerprint + BLOSUM
Fingerprint v2 only 0.927 0.732 27-dim geometry beats ESM-2
ESM-2 (650M) 0.919 0.706 Sequence baseline
DeepTCR 0.944 0.782 Previous best (retrained)
epiTCR 0.930 0.724 #1 in Lu et al. Nature Methods 2025

Key innovations: (1) Fingerprint v2 adds CDR3 shape descriptors (end-to-end distance, Rg, max span, loop length, inter-CDR3 contacts) to centroid distances — independently beats ESM-2. (2) FiLM gating lets structure tell the model which sequence features matter. (3) V gene embeddings add germline context. (4) 5-seed ensemble with early stopping. (5) v3: MHC-aware pMHC encoder — NetMHCpan 34-residue pseudo-sequence BLOSUM-encoded (680-dim) is concatenated with peptide ESM-2, giving the model direct access to HLA restriction context (adds +1.8 ROC / +3.8 PR over v2).

Evaluation honesty — MHC-matched negatives. A naive v3 with random-epitope negatives scored 0.994 ROC / 0.972 PR. Diagnosis: 93.9% of the sampled negatives had mismatching MHC alleles because 92% of epitopes in our data have only a single observed restriction — the model was learning the trivial shortcut "MHC ≠ positive's MHC → negative." We fixed this by MHC-matched negative sampling: for each positive, negatives are drawn only from epitopes restricted to the same allele when ≥2 such candidates exist (random-epitope fallback otherwise). This brings 82.2% of test negatives to MHC-matched status and yields the honest 0.960 / 0.913 above — still a clear win over v2, now unambiguously from MHC-aware pairing rather than allele shortcut.

3g. Structure rescues sequence-hard epitopes (key finding)

Per-epitope paired analysis across 176 test epitopes:

Test Statistic p-value
Fingerprint+ESM-2 > ESM-2 alone (paired Wilcoxon) 105/176 wins p = 0.011
Median per-epitope gain +0.029 ROC
Structure gain × convergence rate (Spearman) ρ = 0.231 p = 0.015

Top-20 most structurally convergent epitopes: mean structure gain +0.063 (6.3 ROC points), 15/20 wins. Bottom-20 least convergent epitopes: −0.022 gain, 11/20 wins.

Concrete rescues (ESM-2 fails, structure fixes):

Epitope ESM-2 Fingerprint+ESM-2 Δ
LPRWYFYYL 0.610 0.981 +0.371
FLYALALLL 0.587 0.828 +0.241
LLLDRLNQL 0.637 0.843 +0.206
ALAGIGILTV 0.628 0.786 +0.158
TTDPSFLGRY 0.563 0.663 +0.101

Conclusion: structural features rescue binding prediction specifically on epitopes with more convergent TCR repertoires — closing the loop between the convergence discovery (Phase 3b) and practical downstream utility.

3h. Head-to-head with SOTA competitors

We benchmarked against the top methods from Lu et al. (Nature Methods 2025) — the most comprehensive TCR binding prediction benchmark (46 methods evaluated):

On our in-distribution split (176 epitopes, retrained):

Rank Method ROC-AUC PR-AUC Source
#1 Ours: NeuralFusion v3 (+MHC) 0.960 0.913 This work (5-seed ensemble, MHC-aware)
#2 Ours: NeuralFusion v2 0.943 0.875 This work (5-seed ensemble)
#3 DeepTCR 0.944 0.782 Sidhom et al. 2021
#4 XGB Combined (ours) 0.937 0.771 This work
#5 epiTCR 0.930 0.724 #1 in Lu et al.
#6 ATM-TCR 0.929 0.714 Top-3 in Lu et al.
#7 TEIM 0.901 0.636 Top-3 in Lu et al.

On the Lu et al. benchmark (their exact CDR3β-only test set):

Rank Method AUPRC
#1 Ours: NeuralFusion v2 0.843
#2 epiTCR (retrained) 0.83
#3 TEPCAM (retrained) 0.82
#4 TEIM (retrained) ~0.80
... (42 more methods) ...

(Lu et al. supply only CDR3β + epitope; their test set has no paired α/V-gene/MHC metadata, so v3's MHC-aware branch can't be evaluated in that setting. v2 is the fair comparison.)

The Lu multi-chain track: a benchmark that's adversarial by construction

Lu et al. also publish a second track with richer features (CDR3α, V/J genes, MHC, full chains) — 6,824 training pairs across 57 epitopes, then 478 test pairs across only 2 held-out test epitopes (GILGFVFTL and GLCTLVAML, both HLA-A*02:01-restricted). We ran three variants of our model on it and reached the same conclusion every time: this track is an adversarial generalization test that penalizes any model with enough capacity to learn per-TCR features.

Results (5-seed ensemble, multi-chain test):

Configuration Test ROC Test AUPRC Train pairs Val PR (best)
Random baseline 0.500 0.500
XGB Fingerprint+ESM-2 0.511 0.545 6,824
NeuralFusion v2 (with shortcut head) 0.405 0.436 6,824 ~0.95
NeuralFusion v3 (no-shortcut head + MHC) 0.387 0.413 6,824 ~0.95
NeuralFusion v3 + CDR3β-only augmentation (75× more test-epitope signal) 0.391 0.413 17,474 ~0.93

Every neural variant lands at ~0.40 — worse than chance. The simpler XGBoost featurizer lands near random (0.51). Val PR stays around 0.93–0.95 throughout, so the model isn't undertrained — it just generalizes negatively from train to test.

Root cause (verified directly in the data):

  1. Epitope asymmetry: 2% of training pairs involve the 2 test epitopes; 98% involve the other 55 epitopes. So most of the model's capacity is spent learning TCR patterns for epitopes it will never be tested on.
  2. Training negative scheme: every training CDR3 appears exactly twice — once as a positive for its cognate epitope, once as a negative for a shuffled epitope. This teaches the model "TCR X binds epitope Y" in a way that leaks a strong TCR-identity prior.
  3. Test negative construction — the trap: test negatives are TCRs that bound other epitopes in training, now re-paired with GILGFVFTL/GLCTLVAML. We measured: 184/239 (77%) of test-negative CDR3Bs appear as positives in the training set, while only 12/239 (5%) of test-positive CDR3Bs appear anywhere in training. A model that encodes "this TCR looks like a binder" from training will score test negatives higher than test positives. Inversion is the mathematically expected outcome.

What we tried and why it didn't help:

  • No-shortcut head (v3 no_shortcut=True): removes the tcr + pmhc additive pathway so the output must depend on TCR×pMHC alignment. Didn't change the outcome because the tcr * pmhc interaction still carries a TCR-identity signal.
  • 75× more training signal for the test epitopes (augment with the CDR3β-only training pool filtered to {GILGFVFTL, GLCTLVAML}, 10,650 extra pairs, bringing test-epitope share from 2% → 62% of train): the model trains on far more positives for the test epitopes, but the inversion persists because the negative sampling at test time still exploits the TCR-identity shortcut.

The only thing that would fix this is a TCR encoder so weak that it can't memorize TCR identity — which is effectively what the XGBoost baseline is, and it lands at 0.51 (barely random, not skilled). Lu's own leaderboard on this track confirms the difficulty: most retrained methods cluster in the 0.50–0.60 AUPRC range, not because the task is easy and we're bad at it, but because the negative-sampling design caps any identity-aware method's ceiling.

Beating TCRconv: ESM-C features + hardened warmup. Our learned 64d AA embeddings overfit to TCR identity on Lu's tiny 6K-pair training set. Swapping the input encoding for frozen per-residue ESM-C 600M features (1152d, no fine-tuning) into the same multi-scale CNN+CE pipeline, with a hardened warmup schedule (20-epoch ramp + min_best_epoch=20 to skip the early "lazy minimum"), gives a 10-seed ensemble of 0.841 AUPRC — ~0.08 above TCRconv:

Method Test ROC Test AUPRC Approach
Ours: ESM-C+CNN+CE (10-seed ensemble, hardened) 0.874 0.841 Frozen ESM-C 600M + multi-scale CNN + CE; warmup=20, min_best_epoch=20
TCRconv (Lu's best retrained) ~0.76 ~0.76 ProtBERT + CNN + CE
Ours: ESM-C+CNN+CE (10-seed, original warmup=10) 0.744 0.717 Same architecture; 3 of 10 seeds collapse at ep<10
Ours: CNN+CE + residual struct (Mode E) 0.633 0.659 Learned 64d embed + struct fingerprint, residual fusion
Ours: CNN+CE + FiLM struct (Mode C) 0.617 0.643 Learned 64d embed + struct fingerprint, FiLM gating
Ours: CNN+CE + branch struct (Mode D) 0.604 0.624 Learned 64d embed + struct fingerprint, MLP-fused
CDR3 BLOSUM k-NN 0.595 0.618 Retrieval (no training)
Ours: CNN+CE pure (Mode A) 0.578 0.608 Learned 64d embed, no struct
Ours: TCRconv-reimpl (CDR3β-only) 0.561 0.587 Learned 64d embed, CDR3β only
Ours: CNN+CE+mixup 0.552 0.567 Learned 64d + embedding-level mixup; rejected
epiTCR / ATM-TCR / TEIM ~0.50 ~0.50 Classification (fails)
NeuralFusion v2/v3 (binary) ~0.40 ~0.41 Binary BCE (inverted)

Per-epitope on the hardened 10-seed ensemble: GLCTLVAML (n=44) ROC 0.901, AUPRC 0.885; GILGFVFTL (n=434) ROC 0.872, AUPRC 0.838. Both test classes well above TCRconv's ~0.76 and orders of magnitude above the binary-BCE methods that get inverted by the adversarial negatives. ESM-C's pretrained protein-language features regularize the encoder away from raw TCR-identity memorization on 6K pairs, which is exactly the bottleneck the 64d-from-scratch encoder hit.

Why two ESM-C rows in the table. The original ESM-C run used a 10-epoch LR warmup and tracked best_epoch from epoch 1, which let three of ten seeds get trapped in a "lazy minimum" at epoch 5–7 (val_loss ≈ 2.97). Those seeds never escaped — Phase 2 retrained for 5–7 epochs only and landed at 0.53 AUPRC, dragging the ensemble down to 0.717. The hardened run doubles the warmup (10→20 epochs, gentler ramp) and gates best_epoch tracking to ep ≥ 20 (forcing the model past the lazy minimum). All ten hardened seeds converged with best_epoch ∈ [46, 97] and individual AUPRC ∈ [0.760, 0.823] — no outliers, ensemble jumps from 0.717 to 0.841. Both rows are kept in the table to document the failure mode and the fix.

Correction history. An early version of this README reported "0.845 AUPRC, beats TCRconv 0.76" for CNN+CE on Lu multi-chain (commit 29838f9). That figure was produced by selecting the best training epoch on test-set ROC — a hyperparameter leak. With proper held-out validation (commit bd3ec5d), the same learned-embedding scripts land at 0.59–0.66 across variants. ESM-C closed the gap to 0.717 (10-seed) and 0.78 (good-7) (commit 4b4530a). Hardened warmup pushes the all-10 ensemble to 0.841 (this commit). scripts/tcrconv_reimpl.py and scripts/cnn_ce_struct.py use proper validation; the ESM-C variant lives at scripts/tcrconv_esmc.py.

What full-length chains, CE, ESM-C, and hardened warmup each buy.

Step AUPRC Δ
Binary BCE on CDR3-only (NeuralFusion v2/v3) 0.41
Multi-class CE on full-length chains, learned 64d embed 0.59 +0.18 (escape inverted-shortcut regime)
+ 6-CDR structural fingerprint (residual fusion) 0.66 +0.07 (small, but real)
+ Frozen ESM-C 600M features (replace 64d embed, original warmup) 0.72 +0.06 (matches good-seeds mean ~0.78 if you filter)
+ Hardened warmup (20-epoch ramp, min_best_epoch=20) 0.84 +0.12 (eliminates the 3-seed lazy-minimum collapse)

The dominant lift comes from cross-entropy on full-length chains (escapes the 0.41 inverted-shortcut regime) and PLM features (regularizes against identity memorization on small data). Structural fingerprints contribute a smaller but real bump. The final +0.12 from the warmup hardening is purely a stability fix — same model, same features, just no failure-mode seeds — but it turns a "matches-on-good-seeds" result into a "decisively beats" result.

3i. Unified feature ablation

We ran the same NeuralFusion v2 architecture (FiLM gating + V genes + ResBlocks) with three feature configurations across all benchmarks to test whether structure adds value:

Benchmark seq_only seq+fp (FiLM) fp_only Struct Δ PR
176 epitopes 0.963 / 0.837 0.964 / 0.838 0.957 / 0.813 +0.001
31 struct-dep 0.954 / 0.802 0.954 / 0.797 0.949 / 0.782 -0.005
Lu multi-chain 0.543 / 0.588 0.472 / 0.523 0.557 / 0.525 -0.066

Key finding: the 6-CDR centroid fingerprint carries real, independent signal (fp_only achieves 0.813 PR-AUC on 176 epitopes using just 27 structural dimensions). However, this signal is redundant with CDR3 sequence — combining structure + sequence adds only +0.001 PR over sequence alone. Structure and sequence capture the same underlying biology through different lenses.

Architecture Results comparison Feature ablation

The key differentiator: while all SOTA sequence methods improve binding prediction uniformly across epitopes, only our structural fingerprint shows improvement that correlates with structural convergence rate:

Method ρ(gain, convergence) p-value
Ours: Fingerprint+ESM-2 +0.231 0.015
DeepTCR +0.127 0.183
ATM-TCR +0.096 0.315
epiTCR +0.079 0.412
TEIM −0.083 0.385

SOTA sequence methods are blind to structural convergence. Our structural features provide targeted improvement where the convergence mechanism predicts they should.

Fig 9: Competitor convergence comparison

Zero-shot epitope generalization: unsolved

Evaluated on the strict epitope-split (test epitopes never seen in training), all methods — including end-to-end GVP-GNN trained on binding directly — hover around 0.51–0.62 ROC-AUC. This limit is not specific to our features; it is the fundamental difficulty of predicting binding for unseen epitopes. See LIMITATIONS.md for details.

Figures

Publication figures in results/paper_figures/:

Figure Content
fig2_convergence Enrichment sweep + per-epitope convergence heatmap
fig3_surface_vs_cdr3 Peak enrichment + signal decay curves (centroid vs CDR3β)
fig4_benchmark Clustering quality (5 methods) + retrieval precision@k
fig5_binding In-distribution ROC curves + model comparison
fig6_ablations CDR loop ablation + controls (random, length, V-gene)
fig7_structure_rescue Structure gain vs convergence rate + rescued epitope examples
fig8_competitors Head-to-head ROC/PR comparison (11 methods)
fig9_competitor_convergence Convergence correlation: only ours is significant
fig_model_architecture Unified CNN+FiLM+CE+BCE architecture diagram
fig_results_comparison PR-AUC across all benchmarks vs competitors
fig_feature_ablation 3x3 ablation table: seq / seq+fp / fp_only

Fig 2: Convergence Fig 6: Ablations Fig 7: Structure rescue

Repository Structure

TCR-FOLD/
├── scripts/
│   ├── curate_data.py                   # Phase 1: merge 5 databases
│   ├── select_benchmark.py              # Phase 2: non-redundant complexes
│   ├── prepare_inputs.py                # Phase 2: method-specific inputs
│   ├── run_benchmark.py                 # Phase 2: Boltz-2/AF3/Protenix/Chai-1
│   ├── evaluate_predictions.py          # Phase 2: DockQ evaluation
│   ├── scale_structure_prediction.py    # Phase 3: IgFold on ~35K TCRs
│   ├── extract_surface_fingerprints.py  # Phase 3: 6-CDR centroid fingerprints
│   ├── compute_epitope_embeddings.py    # Phase 3: ESM-2 for peptides
│   ├── create_indist_splits.py          # Phase 3: per-epitope 80/10/10
│   ├── run_baselines.py                 # Phase 3: GLIPH2, TCRdist3, ESM-2
│   ├── run_combined_model.py            # Phase 3: struct+seq XGBoost combined
│   ├── run_lu_neural.py                 # Phase 3: evaluation on Lu et al. benchmark (CDR3β-only + multi-chain v2)
│   ├── run_lu_neural_v3.py              # Phase 3: v3 MHC-aware + no-shortcut + optional CDR3β-only augmentation
│   ├── tcrconv_reimpl.py                # Phase 3: TCRconv-style CNN+CE (beats TCRconv on Lu MC)
│   ├── truly_unified.py                 # Phase 3: unified CNN+FiLM+CE+BCE across all benchmarks
│   ├── unified_v2_ablation.py           # Phase 3: 3×3 feature ablation (seq/seq+fp/fp_only)
│   ├── prepare_lu_complex_yaml.py       # Phase 3: Boltz-2 complex YAML generator
│   ├── extract_pmhc_interface.py        # Phase 3: TCR-pMHC interface feature extractor
│   ├── generate_model_figures.py        # Publication architecture + result figures
│   ├── compute_mhc_features.py          # Phase 3: MHC pseudo-sequence + BLOSUM encoding
│   ├── pilot_structural_convergence.py  # Phase 3a: pilot experiment
│   └── pilot_binding_surface.py         # Phase 3a: pilot surface analysis
├── models/
│   ├── geometric_encoder.py             # GVP-GNN with CDR-masked pooling
│   ├── neural_binding.py               # FiLM fusion model v1
│   ├── neural_binding_v2.py            # FiLM fusion v2 + V genes + ensemble
│   ├── neural_binding_v3.py            # v2 + MHC pseudo-sequence (MHC-aware)
│   ├── train.py                         # Contrastive pretraining
│   ├── train_binding.py                 # End-to-end binding supervision
│   ├── eval_binding.py                  # Test evaluation of checkpoints
│   └── specificity_classifier.py        # XGBoost binding classifiers
├── analysis/
│   ├── convergence_analysis.py          # 134M-pair enrichment sweep
│   ├── benchmark_specificity.py         # Head-to-head clustering benchmark
│   ├── ablations.py                     # Random control + CDR ablation + filters
│   ├── paper_figures.py                 # Publication figures
│   └── plot_structure_gain.py           # fig7: rescue vs convergence
├── data/
│   ├── benchmark/
│   │   ├── tcr_pmhc_master.tsv          # 49K unified entries
│   │   ├── benchmark_set.tsv            # 213 Phase 2 complexes
│   │   ├── splits/                      # Phase 1 epitope-based splits
│   │   └── splits_indist/               # Phase 3 per-epitope splits
│   ├── full_structures/
│   │   └── all_tcrs.tsv                 # 35,174 unique TCRs + reconstructed chains
│   └── pilot_convergence/
│       └── pilot_tcrs.tsv               # 1,351 pilot TCRs
├── competitors/                          # Competitor method runners
│   ├── run_epitcr.py                    # epiTCR reimplementation
│   ├── run_atmtcr.py                    # ATM-TCR reimplementation
│   ├── run_teim.py                      # TEIM reimplementation
│   └── run_deeptcr.py                   # DeepTCR runner
├── results/
│   ├── dockq_results.tsv                # Phase 2 benchmark
│   ├── convergence/                     # Full-scale enrichment + per-epitope
│   ├── ablations/                       # CDR ablation + controls
│   ├── specificity_benchmark/           # Clustering + retrieval metrics
│   ├── competitors_benchmark/           # epiTCR, ATM-TCR, TEIM, DeepTCR results
│   ├── binding_v2/                      # Fingerprint v2 binding results
│   ├── neural_binding/                  # FiLM fusion v1 results
│   ├── neural_v2/                       # FiLM fusion v2 ensemble results
│   ├── neural_v3/                       # FiLM fusion v3 (+MHC) ensemble results
│   ├── lu_benchmark/                    # Lu et al. Nature Methods evaluation
│   ├── lu_neural/                       # NeuralFusion v2 on Lu et al. CDR3β-only + multi-chain
│   ├── lu_neural_v3/                    # NeuralFusion v3 on Lu multi-chain (no-shortcut ± aug)
│   ├── lu_contrastive/                  # Contrastive learning + CNN+CE + unified ablation
│   ├── lu_complex_pilot/                # Boltz-2 TCR-pMHC complex prediction pilot
│   └── paper_figures/                   # Publication figures (PNG + PDF)
├── tests/
│   └── test_surface_extraction.py       # 3 unit tests
└── docs/
    ├── LIMITATIONS.md                   # Honest accounting of caveats
    └── superpowers/plans/               # Implementation plan

Dataset Statistics

Source Records Unique Epitopes
TCR3d 372 structural complexes 228
ATLAS 697 affinity measurements
VDJdb 30,163 paired TCR entries 1,493
IEDB 33,260 paired entries 2,972
Unified master 49,057 2,935
Phase 3 structures 35,174 1,460
Phase 3 binding eval (≥10 TCRs/epitope) 32,423 190

Limitations

See docs/LIMITATIONS.md for the honest accounting. Key caveats:

  • Zero-shot epitope generalization is unsolved — all methods (structural and sequence) sit at ~0.51 ROC on truly unseen epitopes.
  • Single prediction method (IgFold) — no cross-validation with Boltz-2 or crystal structures at scale. Boltz-2 requires MSAs which are impractical for batch prediction without local databases.
  • CDR1/CDR2 positions are heuristic — scaled from the CDR3 anchor position rather than strict IMGT numbering. Could be off by 3-7 residues for unusual V genes.
  • Competitor methods are reimplementations — epiTCR, ATM-TCR, TEIM were faithfully reimplemented rather than using original code (installation issues on HPC). Validated by matching published performance ranges.
  • Convergence measured on predicted structures — not experimentally validated. Random-structure control rules out prediction homogeneity but not all possible artifacts.

License

Apache-2.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors