Adding Dynamic Tanh (DyT) element wise Operation (Paper from Meta)

### Summary

An important paper from Meta(FAIR), MIT, NYU and Princeton authors in late 2025 shows transformers can perform equally if not better using a Dynamic tanh operation instead of a Layer Normalization. The PR for this issue will aim to add this technique to the nnx src (norms) with minimal code. There is a successor paper to this as well featuring Dynamic erf (Derf) function for which i will open another issue.

I am attaching the related links here:
Paper: https://arxiv.org/pdf/2503.10622
Title: Transformers without Normalization
Code: https://github.com/jiachenzhu/DyT
Website: https://jiachenzhu.github.io/DyT/

cc: @cgarciae @vfdev-5 @samanklesaria 

### Additional context

<img width="732" height="490" alt="Image" src="https://github.com/user-attachments/assets/9f9d7c63-f158-4eb7-a3e2-288f8dc18399" />

<img width="1576" height="348" alt="Image" src="https://github.com/user-attachments/assets/a5df82be-046b-4071-9e8a-091e1b0fe2c3" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Dynamic Tanh (DyT) element wise Operation (Paper from Meta) #5456

Summary

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Adding Dynamic Tanh (DyT) element wise Operation (Paper from Meta) #5456

Description

Summary

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions