layer_norm¶

Normalizes over the last D dimensions (specified via axes); mean and variance are computed across those axes for each remaining slice.

ATen source: aten.native_layer_norm

Inputs¶

Name	Shape	Description
`input`	`(batch, D)`	Tensor to normalize
`gamma`	matches normalized dims	Scale applied to the normalized tensor
`beta`	matches normalized dims	Shift added after the scale

Attributes¶

Name	Type	Description
`axes`	`list[int]`	Dimensions over which mean/variance are computed (the trailing `D` dims)
`eps`	`float`	Numerical-stability epsilon
`version`	`int`	Composite op version

Output¶

Name	Shape	Description
`output`	`(batch, D)`	Same shape as `input`

Data types¶

fp16, fp32, bf16.

PyTorch example¶

import torch
from torch.nn.functional import layer_norm

N, C, H, W = 20, 5, 10, 10
input = torch.randn(N, C, H, W)

# Normalize over the last three dims (C, H, W)
output = layer_norm(input, normalized_shape=[C, H, W], weight=None, bias=None, eps=1e-5)

Reference¶

torch.nn.LayerNorm