# How to Create a New Dataset Type Each dataset class in CVNet should be registered with `data.dataset.DATASET_REGISTRY`. You can either create a new dataset class from scratch or extend one of the existing ones. This class decorator takes allows you to set a `name` and `task` type for the dataset class: ```python from data.datasets import DATASET_REGISTRY from data.datasets.dataset_base import BaseImageDataset @DATASET_REGISTRY.register(name="ade20k", type="segmentation") class ADE20KDataset(BaseImageDataset): # PyTorch Dataset type. ``` This allows you to specify this dataset in your config file with the following format: ```yaml dataset: name: "ade20k" category: "segmentation" # Where the data is stored for train/validation (can be different) root_train: "/mnt/vision_datasets/ADEChallengeData2016/" root_val: "/mnt/vision_datasets/ADEChallengeData2016/" ``` The `name` and `category` refer to the dataset `name` and `task`. You can optionally specify the data location using `root_train` and `root_val`. `BaseImageDataset` will choose the correct path based on the `is_training` and `is_evaluation` parameters. Currently, all datasets in CVNets are subclasses of either `BaseImageDataset` or `BaseVideoDataset`, which are both subclasses of `BaseDataset`. This is currently only a soft requirement. ## Extending an Existing Dataset Most of the time, there is no need to create a new dataset class from scratch. Instead, you can simply extend an existing dataset like `ImagenetDataset`. The `ImagenetDataset` follows the ImageFolder class in `torchvision.datasets.imagenet`. If your data follows the same format you can extend ImageNet and only change the parts that are needed, such as including your amazing new transforms: ```python from data.datasets import DATASET_REGISTRY from data.datasets.classification.imagenet import ImagenetDataset @DATASET_REGISTRY.register(name="my-new-dataset", type="classification") class AmazingDataset(ImagenetDataset): def training_transforms(self, size: tuple or int): # My amazing new training-time transforms ``` Keep in mind that you should probably change the `root_train` and `root_val` paths to where your data is located: ```yaml dataset: name: "my-new-dataset" category: "classification" root_train: "<path-to-training-data>" root_val: "<path-to-validation-data>" ```