Skip to content

Implement dynamic token importance scoring pipeline#12

Draft
Copilot wants to merge 4 commits into
dynamic-approach-to-simplificationfrom
copilot/implement-dynamic-approaches
Draft

Implement dynamic token importance scoring pipeline#12
Copilot wants to merge 4 commits into
dynamic-approach-to-simplificationfrom
copilot/implement-dynamic-approaches

Conversation

Copilot AI commented Mar 12, 2026

Copy link
Copy Markdown
  • Create src/data/dynamic/dynamic_dataset_generator.py — generates (token features, attention scores) training pairs from finetuned checkpoints; includes DynamicScoringDataset and dynamic_collate_fn
  • Create src/models/dynamic/linear_model.py — per-token MLP scoring model (token embed + position + category embed → sigmoid scalar)
  • Create src/models/dynamic/cnn_model.py — 1D CNN scoring model with configurable kernel size; optional lengths param to zero out padding (matching RNN model)
  • Create src/models/dynamic/rnn_model.py — bidirectional GRU many-to-many scoring model with packed-sequence support
  • Create src/training/dynamic/train_dynamic.py — training orchestration for all dynamic models; passes lengths to CNN and RNN; MSE loss, checkpoint saving, load_dynamic_model() utility
  • Create src/models/integrated/integrated_codebert.pyIntegratedCodeBERT extends FinetunedCodeBERT with dynamic scoring → simplification step; CNN path passes lengths_t
  • Create src/models/integrated/integrated_codet5.pyIntegratedCodeT5 extends FineTunedCodeT5 with dynamic scoring → simplification step; CNN path passes lengths_t
  • Create src/evaluation/dynamic/dynamic_evaluator.pyDynamicEvaluator reuses _iter_samples/_score_pairs from code_search_evaluator and delegates BLEU-4 to CodeSummarizationEvaluator
  • Add optional model= parameter to CodeSummarizationEvaluator for clean model injection
  • Address comments: use FinetunedCodeBERT/FineTunedCodeT5 in dataset generator; reuse extract_batch_cls/extract_batch_enc_dec from attention_extraction.py; remove duplicate _iter_samples; add CNN lengths support

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI self-assigned this Mar 12, 2026
…or, models, training, integration, evaluation)"

Co-authored-by: hanyiliu <35943468+hanyiliu@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement dynamic approaches as per outlined spec Implement dynamic token importance scoring pipeline Mar 12, 2026
(tokenizer, model) — model is in eval mode with attention outputs.
"""
print(f"Loading fine-tuned CodeBERT from '{checkpoint_dir}'...")
tokenizer = cast(

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use Roberta here. We should be exclusively using FinetunedCodeBert.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 571a7de. load_finetuned_codebert now creates a FinetunedCodeBERT instance, calls load_checkpoint(), and returns the backbone via get_backbone_and_tokenizer() — no direct HuggingFace loading.

RobertaTokenizer.from_pretrained(checkpoint_dir, extra_special_tokens=[]),
)
model = cast(
T5ForConditionalGeneration,

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. We should be using FinetunedCodeT5.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 571a7de. load_finetuned_codet5 now uses FineTunedCodeT5.load_checkpoint() and get_backbone_and_tokenizer() to get the backbone, consistent with the CodeBERT approach.

# ---------------------------------------------------------------------------

@torch.no_grad()
def _extract_cls_scores(

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not need to redefine attention extraction. This has already been done in attention/attention_extraction.py

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 571a7de. Removed _extract_cls_scores entirely. Now calls extract_batch_cls(batch, tokenizer, model, device, is_t5=False) from attention_extraction.py and passes the result through a small _enrich_batch_output helper that adds token_ids, positions, and category_ids.



@torch.no_grad()
def _extract_cross_attn_scores(

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Attention extraction already defined in attention/attention_extraction.py

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 571a7de. Removed _extract_cross_attn_scores entirely. Now calls extract_batch_enc_dec(batch, tokenizer, model, device) from attention_extraction.py and enriches the output with _enrich_batch_output.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MRR and BLEU-4 evaluation has already been implemented in evaluation/code_search_evaluator.py and evaluation/code_summarizataion_evaluator.py. Can we reuse logic from there?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 571a7de. compute_mrr now imports and uses _iter_samples + _score_pairs from code_search_evaluator (no local duplicate). compute_bleu delegates entirely to CodeSummarizationEvaluator.compute_bleu — the integrated model is injected via a new model= constructor parameter added to CodeSummarizationEvaluator. Since IntegratedCodeT5.forward_generation already applies dynamic simplification, no additional changes are needed in the BLEU path.

import torch.nn as nn


class CNNScoringModel(nn.Module):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider if the CNN model should use the same idea as the RNN model, where we pass a length value so that padding tokens are ignored in this computation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 571a7de. CNNScoringModel.forward now accepts an optional lengths: Optional[torch.Tensor] parameter. When provided, an arange mask zeroes out padding-position scores, exactly mirroring the RNN model's approach. The training loop and both integrated model helpers also pass lengths_t to the CNN.

@hanyiliu

Copy link
Copy Markdown
Owner

@copilot Address all the comments given above.

…CNN lengths param

Co-authored-by: hanyiliu <35943468+hanyiliu@users.noreply.github.com>
Copilot AI requested a review from hanyiliu March 12, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants