# layer_norm Normalizes over the last `D` dimensions (specified via `axes`); mean and variance are computed across those axes for each remaining slice. **ATen source:** `aten.native_layer_norm` ## Inputs | Name | Shape | Description | |---|---|---| | `input` | `(*batch, *D)` | Tensor to normalize | | `gamma` | matches normalized dims | Scale applied to the normalized tensor | | `beta` | matches normalized dims | Shift added after the scale | ## Attributes | Name | Type | Description | |---|---|---| | `axes` | `list[int]` | Dimensions over which mean/variance are computed (the trailing `D` dims) | | `eps` | `float` | Numerical-stability epsilon | | `version` | `int` | Composite op version | ## Output | Name | Shape | Description | |---|---|---| | `output` | `(*batch, *D)` | Same shape as `input` | ## Data types `fp16`, `fp32`, `bf16`. ## PyTorch example ```python import torch from torch.nn.functional import layer_norm N, C, H, W = 20, 5, 10, 10 input = torch.randn(N, C, H, W) # Normalize over the last three dims (C, H, W) output = layer_norm(input, normalized_shape=[C, H, W], weight=None, bias=None, eps=1e-5) ``` ## Reference [`torch.nn.LayerNorm`](https://docs.pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html)