The scorer computes the scores by sequence_cross_entropy_with_logits(). I notice that the begining index of the para is different from the implementation in EPR.
in UDR:
loss_list = sequence_cross_entropy_with_logits(logits=output.logits[:, :-1].contiguous(), targets=entry.input_ids[:, 1:].contiguous(), weights=pad_mask, average=None)
in EPR:
loss_list = sequence_cross_entropy_with_logits(logits=output.logits, targets=entry.input_ids[:,1:], weights=pad_mask, average=None)
So I wander what actually the input is and find this in scorer_dsr.py
tokenized_example = self.tokenizer.encode_plus(enc_text, truncation=True, add_special_tokens=False, return_tensors='pt') tokenized_labels = self.tokenizer.encode_plus(test_answer, truncation=True, add_special_tokens=False, return_tensors='pt')
Since the special tokens aren't add into the inputs, Why do we need to exclude the first of the inputs and the end of the logits?
The scorer computes the scores by
sequence_cross_entropy_with_logits(). I notice that the begining index of the para is different from the implementation in EPR.in UDR:
loss_list = sequence_cross_entropy_with_logits(logits=output.logits[:, :-1].contiguous(), targets=entry.input_ids[:, 1:].contiguous(), weights=pad_mask, average=None)in EPR:
loss_list = sequence_cross_entropy_with_logits(logits=output.logits, targets=entry.input_ids[:,1:], weights=pad_mask, average=None)So I wander what actually the input is and find this in scorer_dsr.py
tokenized_example = self.tokenizer.encode_plus(enc_text, truncation=True, add_special_tokens=False, return_tensors='pt') tokenized_labels = self.tokenizer.encode_plus(test_answer, truncation=True, add_special_tokens=False, return_tensors='pt')Since the special tokens aren't add into the inputs, Why do we need to exclude the first of the inputs and the end of the logits?