multiheaded-self-attention A Multiheaded self attention transformer architeture implementation, without using pytorch transformers. Very inefficient, credit to Andrej Karpathy for a nice tutorial!