Implement dynamic token importance scoring pipeline by Copilot · Pull Request #12 · hanyiliu/fitcode

Copilot · 2026-03-12T02:34:28Z

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…or, models, training, integration, evaluation)" Co-authored-by: hanyiliu <35943468+hanyiliu@users.noreply.github.com>

hanyiliu · 2026-03-12T14:55:04Z

+        (tokenizer, model) — model is in eval mode with attention outputs.
+    """
+    print(f"Loading fine-tuned CodeBERT from '{checkpoint_dir}'...")
+    tokenizer = cast(


We don't use Roberta here. We should be exclusively using FinetunedCodeBert.

Fixed in 571a7de. load_finetuned_codebert now creates a FinetunedCodeBERT instance, calls load_checkpoint(), and returns the backbone via get_backbone_and_tokenizer() — no direct HuggingFace loading.

hanyiliu · 2026-03-12T14:55:27Z

+            RobertaTokenizer.from_pretrained(checkpoint_dir, extra_special_tokens=[]),
+        )
+    model = cast(
+        T5ForConditionalGeneration,


Same here. We should be using FinetunedCodeT5.

Fixed in 571a7de. load_finetuned_codet5 now uses FineTunedCodeT5.load_checkpoint() and get_backbone_and_tokenizer() to get the backbone, consistent with the CodeBERT approach.

hanyiliu · 2026-03-12T14:56:30Z

+# ---------------------------------------------------------------------------
+
+@torch.no_grad()
+def _extract_cls_scores(


Do not need to redefine attention extraction. This has already been done in attention/attention_extraction.py

Fixed in 571a7de. Removed _extract_cls_scores entirely. Now calls extract_batch_cls(batch, tokenizer, model, device, is_t5=False) from attention_extraction.py and passes the result through a small _enrich_batch_output helper that adds token_ids, positions, and category_ids.

hanyiliu · 2026-03-12T14:56:49Z

+
+
+@torch.no_grad()
+def _extract_cross_attn_scores(


Same here. Attention extraction already defined in attention/attention_extraction.py

Fixed in 571a7de. Removed _extract_cross_attn_scores entirely. Now calls extract_batch_enc_dec(batch, tokenizer, model, device) from attention_extraction.py and enriches the output with _enrich_batch_output.

hanyiliu · 2026-03-12T14:58:38Z

MRR and BLEU-4 evaluation has already been implemented in evaluation/code_search_evaluator.py and evaluation/code_summarizataion_evaluator.py. Can we reuse logic from there?

Fixed in 571a7de. compute_mrr now imports and uses _iter_samples + _score_pairs from code_search_evaluator (no local duplicate). compute_bleu delegates entirely to CodeSummarizationEvaluator.compute_bleu — the integrated model is injected via a new model= constructor parameter added to CodeSummarizationEvaluator. Since IntegratedCodeT5.forward_generation already applies dynamic simplification, no additional changes are needed in the BLEU path.

hanyiliu · 2026-03-12T15:17:14Z

+import torch.nn as nn
+
+
+class CNNScoringModel(nn.Module):


Consider if the CNN model should use the same idea as the RNN model, where we pass a length value so that padding tokens are ignored in this computation.

Added in 571a7de. CNNScoringModel.forward now accepts an optional lengths: Optional[torch.Tensor] parameter. When provided, an arange mask zeroes out padding-position scores, exactly mirroring the RNN model's approach. The training loop and both integrated model helpers also pass lengths_t to the CNN.

hanyiliu · 2026-03-12T16:58:46Z

@copilot Address all the comments given above.

…CNN lengths param Co-authored-by: hanyiliu <35943468+hanyiliu@users.noreply.github.com>

Copilot AI self-assigned this Mar 12, 2026

Initial plan

7a53570

Copilot AI assigned hanyiliu Mar 12, 2026

Copilot started work on behalf of hanyiliu March 12, 2026 02:34 View session

Implement dynamic approaches for code simplification (dataset generat…

13f329b

…or, models, training, integration, evaluation)" Co-authored-by: hanyiliu <35943468+hanyiliu@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement dynamic approaches as per outlined spec~~ Implement dynamic token importance scoring pipeline Mar 12, 2026

Copilot finished work on behalf of hanyiliu March 12, 2026 02:55

hanyiliu reviewed Mar 12, 2026

View reviewed changes

Copilot started work on behalf of hanyiliu March 12, 2026 16:59 View session

Address review feedback: use wrapper classes, reuse evaluator logic, …

571a7de

…CNN lengths param Co-authored-by: hanyiliu <35943468+hanyiliu@users.noreply.github.com>

Copilot AI requested a review from hanyiliu March 12, 2026 17:19

Copilot finished work on behalf of hanyiliu March 12, 2026 17:19

Update dynamic_approaches.md

6d37e06



		@torch.no_grad()
		def _extract_cross_attn_scores(

		import torch.nn as nn


		class CNNScoringModel(nn.Module):

Conversation

Copilot AI commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanyiliu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

hanyiliu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

hanyiliu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

hanyiliu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

hanyiliu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

hanyiliu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

hanyiliu commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 12, 2026 •

edited

Loading