Hi, May I know whether I can use sima instead of multi head attention in decoder, to reduce complexity? Thanks!
Hi,
May I know whether I can use sima instead of multi head attention in decoder, to reduce complexity?
Thanks!