============= Data Samplers ============= CVNet offer data samplers with three sampling strategies: 1. Single-scale with fixed batch size (SSc-FBS) 2. Multi-scale with fixed batch size (MSc-FBS) 3. Multi-scale with variable batch size (MSc-VBS) For details about these samplers, please see `MobileViT <https://arxiv.org/abs/2110.02178>`_ paper. Single-scale with fixed batch size (SSc-FBS) ======= This method is the default sampling strategy in most of the deep learning frameworks (e.g., PyTorch, Tensorflow, and MixNet) and libraries built on top of them (e.g., the *timm* library). At the :math:`t`-th training iteration, this method samples a batch of :math:`b` images per GPU [1]_ with a pre-defined spatial resolution of height :math:`H` and width :math:`W`. Multi-scale with fixed batch size (MSc-FBS) ======= The SSc-FBS method allows a network to learn representations at a single scale (or resolution). However, objects in the real-world are composed at different scales. To allow a network to learn representations at multiple scales, MSc-FBS extends SSc-FBS to multiple scales. Unlike the SSc-FBS method that takes a pre-defined spatial resolution as an input, this method takes a sorted set of $n$ spatial resolutions :math:`\mathcal{S} = \{ (H_1, W_1), (H_2, W_2), \cdots, (H_n, W_n)\}` as an input. At the :math:`t`-th iteration, this method randomly samples :math:`b` images per GPU of spatial resolution :math:`(H_t, W_t) \in \mathcal{S}`. Multi-scale with variable batch size (MSc-VBS): ======= Networks trained using the MSc-FBS methods are more robust to scale changes as compared to SSc-FBS. However, depending on the maximum spatial resolution in :math:`\mathcal{S}`, MSc-FBS methods may have a higher peak GPU memory utilization (see Figure \ref{fig:sampler_perf_cost}) as compared to SSc-FBS; causing out-of-memory errors on GPUs with limited memory. For example, MSc-FBS with :math:`\mathcal{S} = \{ (128, 128), (192, 192), (224, 224), (320, 320)\}` and :math:`b=256` would need about :math:`2\times` more GPU memory (for images only) than SSc-FBS with a spatial resolution of :math:`(224, 224)` and :math:`b=256`. To address this memory issue, we extend MSc-FBS to variably-batch sizes. For a given sorted set of spatial resolutions :math:`\mathcal{S} = \{ (H_1, W_1), (H_2, W_2), \cdots, (H_n, W_n)\}` and a batch size :math:`b` for a maximum spatial resolution of :math:`(H_n, W_n)`, a spatial resolution :math:`(H_t, W_t) \in \mathcal{S}` with a batch size of :math:`b_t = \frac{H_n W_n b}{H_t W_t}` is sampled randomly at :math:`t`-th training iteration on each GPU. Variably-sized video sampler ------ These samplers can be easily extended for videos also. CVNet provides variably-sized sampler for videos, wherein researchers can control different video-related input variables (e.g., number of frames, number of clips per video, and video spatial resolution) for learning space- and time-invariant representations. Data Sampler Objects ==================== .. automodule:: data.sampler.batch_sampler :members: :undoc-members: :show-inheritance: .. automodule:: data.sampler.multi_scale_sampler :members: :undoc-members: :show-inheritance: .. automodule:: data.sampler.variable_batch_sampler :members: :undoc-members: :show-inheritance: .. automodule:: data.sampler.video_variable_seq_sampler :members: :undoc-members: :show-inheritance: .. [1] The effective batch size is the number of images per GPU times the number of GPUs.