Implement dynamic token importance scoring pipeline#12
Conversation
…or, models, training, integration, evaluation)" Co-authored-by: hanyiliu <35943468+hanyiliu@users.noreply.github.com>
| (tokenizer, model) — model is in eval mode with attention outputs. | ||
| """ | ||
| print(f"Loading fine-tuned CodeBERT from '{checkpoint_dir}'...") | ||
| tokenizer = cast( |
There was a problem hiding this comment.
We don't use Roberta here. We should be exclusively using FinetunedCodeBert.
There was a problem hiding this comment.
Fixed in 571a7de. load_finetuned_codebert now creates a FinetunedCodeBERT instance, calls load_checkpoint(), and returns the backbone via get_backbone_and_tokenizer() — no direct HuggingFace loading.
| RobertaTokenizer.from_pretrained(checkpoint_dir, extra_special_tokens=[]), | ||
| ) | ||
| model = cast( | ||
| T5ForConditionalGeneration, |
There was a problem hiding this comment.
Same here. We should be using FinetunedCodeT5.
There was a problem hiding this comment.
Fixed in 571a7de. load_finetuned_codet5 now uses FineTunedCodeT5.load_checkpoint() and get_backbone_and_tokenizer() to get the backbone, consistent with the CodeBERT approach.
| # --------------------------------------------------------------------------- | ||
|
|
||
| @torch.no_grad() | ||
| def _extract_cls_scores( |
There was a problem hiding this comment.
Do not need to redefine attention extraction. This has already been done in attention/attention_extraction.py
There was a problem hiding this comment.
Fixed in 571a7de. Removed _extract_cls_scores entirely. Now calls extract_batch_cls(batch, tokenizer, model, device, is_t5=False) from attention_extraction.py and passes the result through a small _enrich_batch_output helper that adds token_ids, positions, and category_ids.
|
|
||
|
|
||
| @torch.no_grad() | ||
| def _extract_cross_attn_scores( |
There was a problem hiding this comment.
Same here. Attention extraction already defined in attention/attention_extraction.py
There was a problem hiding this comment.
Fixed in 571a7de. Removed _extract_cross_attn_scores entirely. Now calls extract_batch_enc_dec(batch, tokenizer, model, device) from attention_extraction.py and enriches the output with _enrich_batch_output.
There was a problem hiding this comment.
MRR and BLEU-4 evaluation has already been implemented in evaluation/code_search_evaluator.py and evaluation/code_summarizataion_evaluator.py. Can we reuse logic from there?
There was a problem hiding this comment.
Fixed in 571a7de. compute_mrr now imports and uses _iter_samples + _score_pairs from code_search_evaluator (no local duplicate). compute_bleu delegates entirely to CodeSummarizationEvaluator.compute_bleu — the integrated model is injected via a new model= constructor parameter added to CodeSummarizationEvaluator. Since IntegratedCodeT5.forward_generation already applies dynamic simplification, no additional changes are needed in the BLEU path.
| import torch.nn as nn | ||
|
|
||
|
|
||
| class CNNScoringModel(nn.Module): |
There was a problem hiding this comment.
Consider if the CNN model should use the same idea as the RNN model, where we pass a length value so that padding tokens are ignored in this computation.
There was a problem hiding this comment.
Added in 571a7de. CNNScoringModel.forward now accepts an optional lengths: Optional[torch.Tensor] parameter. When provided, an arange mask zeroes out padding-position scores, exactly mirroring the RNN model's approach. The training loop and both integrated model helpers also pass lengths_t to the CNN.
|
@copilot Address all the comments given above. |
…CNN lengths param Co-authored-by: hanyiliu <35943468+hanyiliu@users.noreply.github.com>
src/data/dynamic/dynamic_dataset_generator.py— generates (token features, attention scores) training pairs from finetuned checkpoints; includesDynamicScoringDatasetanddynamic_collate_fnsrc/models/dynamic/linear_model.py— per-token MLP scoring model (token embed + position + category embed → sigmoid scalar)src/models/dynamic/cnn_model.py— 1D CNN scoring model with configurable kernel size; optionallengthsparam to zero out padding (matching RNN model)src/models/dynamic/rnn_model.py— bidirectional GRU many-to-many scoring model with packed-sequence supportsrc/training/dynamic/train_dynamic.py— training orchestration for all dynamic models; passeslengthsto CNN and RNN; MSE loss, checkpoint saving,load_dynamic_model()utilitysrc/models/integrated/integrated_codebert.py—IntegratedCodeBERTextendsFinetunedCodeBERTwith dynamic scoring → simplification step; CNN path passeslengths_tsrc/models/integrated/integrated_codet5.py—IntegratedCodeT5extendsFineTunedCodeT5with dynamic scoring → simplification step; CNN path passeslengths_tsrc/evaluation/dynamic/dynamic_evaluator.py—DynamicEvaluatorreuses_iter_samples/_score_pairsfromcode_search_evaluatorand delegates BLEU-4 toCodeSummarizationEvaluatormodel=parameter toCodeSummarizationEvaluatorfor clean model injectionFinetunedCodeBERT/FineTunedCodeT5in dataset generator; reuseextract_batch_cls/extract_batch_enc_decfromattention_extraction.py; remove duplicate_iter_samples; add CNNlengthssupport💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.