`dnikit_torch`#

DNIKit PyTorch integration and support.

class dnikit_torch.ProducerTorchDataset(producer, mapping, batch_size=100, transforms=None)[source]#

Bases: IterableDataset

Adaptor that transforms any Producer into a PyTorch IterableDataset. The Producer can be something simple like a ImageProducer or a more complex pipeline of stages.

Instances are given a mapping that describes how to transform the structured data in a Batch.ElementType (type of single batch.elements) into an unstructured tuple that PyTorch expects from a Dataset. This same mapping can be used to map the positional values from PyTorch back into a dnikit Producer via TorchProducer.

This class also supports an optional transforms that works similar to the transforms attr on PyTorch image datasets.

See Also

TorchProducer – converts a PyTorch Dataset/DataLoader into a Producer

Parameters:

producer (Producer) – see producer
mapping (Sequence[str | DictMetaKey | MetaKey | Callable[[ElementType], Any]]) – see mapping
batch_size (int) – [optional] see batch_size
transforms (Mapping[str, Callable[[Tensor], Tensor]] | None) – [optional] see transforms

batch_size: int = 100#: The size of batch to read from the producer. This is independent of the downstream batch size in PyTorch.

mapping: Sequence[str | DictMetaKey | MetaKey | Callable[[ElementType], Any]]#

Describes how to map a Batch.ElementType to the Dataset result. Typically the first value returned from a Dataset is an array-like piece of data, e.g. an image field in a typical Batch.

The mapping supports several different types of values:

string – names a batch.fields to copy into the output
DictMetaKey / MetaKey – names a batch.metadata to copy into the output
callable – custom code to produce a custom result

For example:

# consider a Batch.ElementType with data like this:
im = np.random.randint(255, size=(64, 64), dtype=np.uint8)
fields = {
    "image": im,
    "image2": im,
}
key1 = Batch.DictMetaKey[dict]("KEY1")
metadata = {
    key1: {"k1": "v1", "k2": "v2"}
}

# it's possible to define the mapping like this:

def transform(element: Batch.ElementType) -> np.ndarray:
    # note: pycharm requires a writable copy of the ndarray
    return element.fields["image"].reshape((128, 32)).copy()

ds = ProducerTorchDataset(producer, ["image", "image2", key1, transform])

In this example the Dataset will produce two ndarrays, a dictionary and a reshaped ndarray.

producer: Producer#: The Producer to represent as a PyTorch Dataset.

transforms: Mapping[str, Callable[[Tensor], Tensor]] | None = None#

//pytorch.org/vision/stable/transforms.html). This is a mapping from field name to a Tensor transform, e.g. image and audio transforms.

Typical PyTorch Datasets provide a transform and target_transform to transform the first and second values. This class requires passing in specific field names for the transforms to apply to.

For example:

dataset = ProducerTorchDataset(
    producer, ["image", "mask", "heights"],
    transforms={
        "image": transforms.RandomCrop(32, 32),
        "mask": transforms.Compose([
             transforms.CenterCrop(10),
             transforms.ColorJitter(),
        ]),
    })

Type:: Optional transforms (https

class dnikit_torch.TorchProducer(data_loader, mapping, anonymous_field_name='_')[source]#

Bases: Producer

Adaptor that transforms a PyTorch DataLoader into a DNIKit Producer. This enables reuse of PyTorch Datasets with DNIKit pipelines.

Instances are given a mapping that describes how to transform the unstructured tuple that a PyTorch Dataset produces into a structured DNIKit Batch. This same mapping can be used to match a Batch into a Dataset in ProducerTorchDataset.

See Also

ProducerTorchDataset – Producer into a PyTorch Dataset

Parameters:

data_loader (DataLoader) – see data_loader
mapping (Sequence[str | DictMetaKey | MetaKey | Callable[[Any, Builder], None]]) – see mapping
anonymous_field_name (str) – see anonymous_field_name

anonymous_field_name: str = '_'#

The field name to use when mapping non-dictionary metadata to DictMetaKey. For example, if a PyTorch Dataset produces:

yield ndarray, [10, 20, 30]

it can be mapped into a DictMetaKey like this:

key1 = Batch.DictMetaKey[t.List[int]]("KEY1")
producer = TorchProducer(loader, ["image", key1])

element = next(iter(producer(1))).elements[0]

# this is how the metadata is surfaced
element.metadata[key1] == { "_": [10, 20, 30] }

Ideally a MetaKey is used in these cases.

property batch_size: int#

data_loader: DataLoader#: The PyTorch DataLoader to adapt to a Producer.

mapping: Sequence[str | DictMetaKey | MetaKey | Callable[[Any, Builder], None]]#

This mapping defines how the positional values in a PyTorch Dataset map back to a structured Batch. This is essentially the same mapping used in ProducerTorchDataset – the same mapping could be used to round-trip the data between PyTorch and dnikit.

The values in the mapping correspond to the positions in the Dataset result and convert values as follows:

string – map a Tensor into a batch.fields numpy.ndarray
DictMetaKey / MetaKey – map a value into batch.metadata
callable – perform custom conversion and update the Batch.Builder
None – discard a value

For example, given a Dataset that produced data like this:

yield ndarray, ndarray, 50, {"k1": "v1", "k2": "v2"}

it can be mapped into dnikit metadata like this:

key1 = Batch.DictMetaKey[int]("KEY1")
key2 = Batch.DictMetaKey[t.Mapping[str, str]]("KEY2")
producer = TorchProducer(loader, ["image", None, key1, key2])

That will map the first field into batch.fields["image"] as an numpy.ndarray. The second field will be discarded. The third and fourth fields will come across as metadata like this:

element.metadata[key1] == { "_": 50 }
element.metadata[key2] == {"k1": "v1", "k2": "v2"}

If the Dataset only produces image data, a single mapping will be sufficient: ["image"]

dnikit_torch

Contents

dnikit_torch#

`dnikit_torch`#