layer_norm

Normalizes over the last D dimensions (specified via axes); mean and variance are computed across those axes for each remaining slice.

ATen source: aten.native_layer_norm

Inputs

Name

Shape

Description

input

(*batch, *D)

Tensor to normalize

gamma

matches normalized dims

Scale applied to the normalized tensor

beta

matches normalized dims

Shift added after the scale

Attributes

Name

Type

Description

axes

list[int]

Dimensions over which mean/variance are computed (the trailing D dims)

eps

float

Numerical-stability epsilon

version

int

Composite op version

Output

Name

Shape

Description

output

(*batch, *D)

Same shape as input

Data types

fp16, fp32, bf16.

PyTorch example

import torch
from torch.nn.functional import layer_norm

N, C, H, W = 20, 5, 10, 10
input = torch.randn(N, C, H, W)

# Normalize over the last three dims (C, H, W)
output = layer_norm(input, normalized_shape=[C, H, W], weight=None, bias=None, eps=1e-5)

Reference

torch.nn.LayerNorm