I understand that during the inference stage, a single past_key_values is being used for both the augment model and the anchor model. I'm curious whether this is correct or if I've misunderstood something, because the hidden state values used in the augment model and those used in the anchor model would be different. I'm leaving this question to clarify my understanding.
I understand that during the inference stage, a single past_key_values is being used for both the augment model and the anchor model. I'm curious whether this is correct or if I've misunderstood something, because the hidden state values used in the augment model and those used in the anchor model would be different. I'm leaving this question to clarify my understanding.