Skip to content

Adding Dynamic Tanh (DyT) element wise Operation (Paper from Meta) #5456

@coder0143

Description

@coder0143

Summary

An important paper from Meta(FAIR), MIT, NYU and Princeton authors in late 2025 shows transformers can perform equally if not better using a Dynamic tanh operation instead of a Layer Normalization. The PR for this issue will aim to add this technique to the nnx src (norms) with minimal code. There is a successor paper to this as well featuring Dynamic erf (Derf) function for which i will open another issue.

I am attaching the related links here:
Paper: https://arxiv.org/pdf/2503.10622
Title: Transformers without Normalization
Code: https://github.com/jiachenzhu/DyT
Website: https://jiachenzhu.github.io/DyT/

cc: @cgarciae @vfdev-5 @samanklesaria

Additional context

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions