The problem about the beginning index

The scorer computes the scores by `sequence_cross_entropy_with_logits()`. I notice that the begining index of the para is different from the implementation in EPR.
in UDR:
`
 loss_list = sequence_cross_entropy_with_logits(logits=output.logits[:, :-1].contiguous(),
                                                               targets=entry.input_ids[:, 1:].contiguous(),
                                                               weights=pad_mask,
                                                               average=None)
`
in EPR:
`
loss_list = sequence_cross_entropy_with_logits(logits=output.logits,
                                                                targets=entry.input_ids[:,1:],
                                                                weights=pad_mask,
                                                                average=None)
`
So I wander what actually the input is and find this in scorer_dsr.py
`
            tokenized_example = self.tokenizer.encode_plus(enc_text, truncation=True, add_special_tokens=False,
                                                           return_tensors='pt')
            tokenized_labels = self.tokenizer.encode_plus(test_answer, truncation=True, add_special_tokens=False,
                                                          return_tensors='pt')
`
Since the special tokens aren't add into the inputs, Why do we need to exclude the first of the inputs and the end of the logits?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem about the beginning index #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The problem about the beginning index #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions