# batch_norm Inference-time batch normalization using running statistics: $$y = \gamma \cdot \frac{x - \mu}{\sqrt{\sigma^2 + \varepsilon}} + \beta$$ The mean and variance are pre-computed running statistics passed in as inputs; `momentum` (a training-only construct) is dropped during conversion. **ATen source:** `aten._native_batch_norm_legit_no_training` ## Inputs | Name | Shape | Description | |---|---|---| | `input` | `(N, C, *spatial)` | Supported ranks: 2, 3, 4, 5 — `(N, C)`, `(N, C, L)`, `(N, C, H, W)`, `(N, C, D, H, W)` | | `gamma` | `(C,)` | Per-channel scale, applied after normalization | | `beta` | `(C,)` | Per-channel shift, added after the scale | | `mean` | `(C,)` | Per-channel running mean | | `variance` | `(C,)` | Per-channel running variance | ## Attributes | Name | Type | Description | |---|---|---| | `eps` | `float` | Numerical-stability epsilon added to the variance | | `version` | `int` | Composite op version | ## Output | Name | Shape | Description | |---|---|---| | `output` | `(N, C, *spatial)` | Same shape as `input` | ## Data types `fp16`, `fp32`, `bf16` for all tensor inputs and the output. ## PyTorch example ```python import torch N, C, H, W = 20, 5, 10, 10 input = torch.randn(N, C, H, W) running_mean = torch.zeros(C) running_var = torch.ones(C) output = torch.ops.aten._native_batch_norm_legit_no_training( input, weight=torch.ones(C), bias=torch.zeros(C), running_mean=running_mean, running_var=running_var, momentum=0.1, eps=1e-5, ) ``` ## Reference [`torch.nn.BatchNorm2d`](https://docs.pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html)