Skip to content

Fix transpose in attention scores equation#59

Merged
shensquared merged 1 commit into
390introml:mainfrom
taziksh:patch-1
Jun 5, 2026
Merged

Fix transpose in attention scores equation#59
shensquared merged 1 commit into
390introml:mainfrom
taziksh:patch-1

Conversation

@taziksh
Copy link
Copy Markdown
Contributor

@taziksh taziksh commented Jun 5, 2026

9.3.2: Remove outer transpose in attention scores equation

The outer transpose in the attention scores equation appears inconsistent with the stated shape $a_i \in \mathbb{R}^{1 \times n}$.

The current expression is:

$$ a_i = \text{softmax}\left( \frac{[q_i^T k_1, q_i^T k_2, \dots, q_i^T k_n]} {\sqrt{d_k}} \right)^T \in \mathbb{R}^{1 \times n} $$

However, the vector

$$ [q_i^T k_1, q_i^T k_2, \dots, q_i^T k_n] $$

is treated elsewhere as a row vector of shape $1 \times n$. Applying softmax preserves that shape, so the final transpose would make it $n \times 1$, contradicting the stated shape $1 \times n$.

9.3.2: Corrected the transpose notation in the attention scores equation
@shensquared
Copy link
Copy Markdown
Member

awesome. thanks!

@shensquared shensquared merged commit 4b48a9a into 390introml:main Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants