dnikit_torch
#
DNIKit PyTorch integration and support.
- class dnikit_torch.ProducerTorchDataset(producer, mapping, batch_size=100, transforms=None)[source]#
Bases:
IterableDataset
Adaptor that transforms any
Producer
into aPyTorch IterableDataset
. The Producer can be something simple like aImageProducer
or a more complexpipeline
of stages.Instances are given a
mapping
that describes how to transform the structured data in aBatch.ElementType
(type of singlebatch.elements
) into an unstructured tuple that PyTorch expects from a Dataset. This same mapping can be used to map the positional values from PyTorch back into a dnikit Producer viaTorchProducer
.This class also supports an optional
transforms
that works similar to thetransforms
attr on PyTorch image datasets.- See Also
TorchProducer
– converts a PyTorch Dataset/DataLoader into aProducer
- Parameters:
mapping (Sequence[str | DictMetaKey | MetaKey | Callable[[ElementType], Any]]) – see
mapping
batch_size (int) – [optional] see
batch_size
transforms (Mapping[str, Callable[[Tensor], Tensor]] | None) – [optional] see
transforms
- batch_size: int = 100#
The size of batch to read from the producer. This is independent of the downstream batch size in PyTorch.
- mapping: Sequence[str | DictMetaKey | MetaKey | Callable[[ElementType], Any]]#
Describes how to map a
Batch.ElementType
to the Dataset result. Typically the first value returned from a Dataset is an array-like piece of data, e.g. animage
field
in a typicalBatch
.The mapping supports several different types of values:
string – names a batch.fields to copy into the output
DictMetaKey
/MetaKey
– names abatch.metadata
to copy into the outputcallable – custom code to produce a custom result
For example:
# consider a Batch.ElementType with data like this: im = np.random.randint(255, size=(64, 64), dtype=np.uint8) fields = { "image": im, "image2": im, } key1 = Batch.DictMetaKey[dict]("KEY1") metadata = { key1: {"k1": "v1", "k2": "v2"} } # it's possible to define the mapping like this: def transform(element: Batch.ElementType) -> np.ndarray: # note: pycharm requires a writable copy of the ndarray return element.fields["image"].reshape((128, 32)).copy() ds = ProducerTorchDataset(producer, ["image", "image2", key1, transform])
In this example the Dataset will produce two ndarrays, a dictionary and a reshaped ndarray.
- transforms: Mapping[str, Callable[[Tensor], Tensor]] | None = None#
//pytorch.org/vision/stable/transforms.html). This is a mapping from field name to a Tensor transform, e.g. image and audio transforms.
Typical PyTorch Datasets provide a
transform
andtarget_transform
to transform the first and second values. This class requires passing in specific field names for the transforms to apply to.For example:
dataset = ProducerTorchDataset( producer, ["image", "mask", "heights"], transforms={ "image": transforms.RandomCrop(32, 32), "mask": transforms.Compose([ transforms.CenterCrop(10), transforms.ColorJitter(), ]), })
- Type:
Optional transforms (https
- class dnikit_torch.TorchProducer(data_loader, mapping, anonymous_field_name='_')[source]#
Bases:
Producer
Adaptor that transforms a PyTorch DataLoader into a DNIKit
Producer
. This enables reuse of PyTorch Datasets with DNIKitpipelines
.Instances are given a
mapping
that describes how to transform the unstructured tuple that a PyTorch Dataset produces into a structured DNIKitBatch
. This same mapping can be used to match aBatch
into a Dataset inProducerTorchDataset
.- See Also
ProducerTorchDataset
–Producer
into a PyTorch Dataset
- Parameters:
data_loader (DataLoader) – see
data_loader
mapping (Sequence[str | DictMetaKey | MetaKey | Callable[[Any, Builder], None]]) – see
mapping
anonymous_field_name (str) – see
anonymous_field_name
- anonymous_field_name: str = '_'#
The field name to use when mapping non-dictionary metadata to
DictMetaKey
. For example, if a PyTorch Dataset produces:yield ndarray, [10, 20, 30]
it can be mapped into a
DictMetaKey
like this:key1 = Batch.DictMetaKey[t.List[int]]("KEY1") producer = TorchProducer(loader, ["image", key1]) element = next(iter(producer(1))).elements[0] # this is how the metadata is surfaced element.metadata[key1] == { "_": [10, 20, 30] }
Ideally a
MetaKey
is used in these cases.
- data_loader: DataLoader#
The PyTorch DataLoader to adapt to a
Producer
.
- mapping: Sequence[str | DictMetaKey | MetaKey | Callable[[Any, Builder], None]]#
This mapping defines how the positional values in a PyTorch Dataset map back to a structured
Batch
. This is essentially the same mapping used inProducerTorchDataset
– the same mapping could be used to round-trip the data between PyTorch and dnikit.The values in the mapping correspond to the positions in the Dataset result and convert values as follows:
string – map a Tensor into a
batch.fields
numpy.ndarray
DictMetaKey
/MetaKey
– map a value intobatch.metadata
callable – perform custom conversion and update the
Batch.Builder
None – discard a value
For example, given a Dataset that produced data like this:
yield ndarray, ndarray, 50, {"k1": "v1", "k2": "v2"}
it can be mapped into dnikit
metadata
like this:key1 = Batch.DictMetaKey[int]("KEY1") key2 = Batch.DictMetaKey[t.Mapping[str, str]]("KEY2") producer = TorchProducer(loader, ["image", None, key1, key2])
That will map the first field into
batch.fields["image"]
as annumpy.ndarray
. The second field will be discarded. The third and fourth fields will come across asmetadata
like this:element.metadata[key1] == { "_": 50 } element.metadata[key2] == {"k1": "v1", "k2": "v2"}
If the Dataset only produces image data, a single mapping will be sufficient:
["image"]