Processors API#
- class dnikit.processors.Processor(func, *, fields=None)[source]#
Bases:
PipelineStageClass to apply transformations to the fields of
Batch.All other processors in DNIKit should inherit from this class. Note that this is not an abstract base class. A custom valid processor can be instantiated by simply passing a function, as shown in the next example:
Example
def to_db_func(in: np.ndarray) -> np.ndarray: ref_value = 1e-5 return 20 * np.log10(in/ref_value) processor = Processor(to_db_func) # processor can now be used with a pipeline.
- class dnikit.processors.Cacher(storage_path=None)[source]#
Bases:
PipelineStageCacheris aPipelineStagethat will cache to disk the batches produced by the previousProducerin a pipeline created withpipeline().The first time a pipeline with a
Cacheris executed,Cacherstore the batches to disk. Every time the pipeline is called after that, batches will be read directly from disk, without doing any computation for previous stages.Note that batches may be quite large and this may require a large portion of available disk space. Be mindful when using
Cacher.If the data from the
producerdoes not haveBatch.StdKeys.IDENTIFIER, this class will assign a numeric identifier. This cannot be used across calls toCacherbut will be consistent for all uses of thepipelined_producer.Example
producer = ... # create a valid dnikit Producer processor = ... # create a valid dnikit Processor cacher = Cacher() # Pipeline everything pipelined_producer = pipeline(producer, processor, cacher) # No results have been cached cacher.cached # returns False # Trigger pipeline batches = list(pipelined_producer(batch_size=32)) # producer and processor are invoked. # Results have been cached cacher.cached # returns True # Trigger pipeline again (fast, because batch_size has the same value as before) list(pipelined_producer(batch_size=32)) # producer and processor are NOT invoked # Trigger pipeline once more (slower, because batch_size is different from first time) list(pipelined_producer(batch_size=48)) # producer and processor are NOT invoked
The typical use-case for this class is to cache the results of expensive computation (such as inference and post-processing) to avoid re-doing said computation more than once.
Note
Just as with
Model, andProcessorno computation (or in this case, caching) will be executed until the pipeline is triggered.See also
dnikit.base.multi_introspect()which allows several introspectors to use the same batches without storing them in the file-system.multi_introspect()may be a better option for very large datasets.Warning
Cacherhas the ability to resize batches if batches of different sizes are requested (see example). However, doing so is relatively computationally expensive since it involves concatenating and splitting batches. Therefore it’s recommended to use this feature sparingly.Warning
Unlike other
PipelineStage,Cacherwill raise aDNIKitExceptionif it is used with more than one pipeline. This is to avoid reading batches generated from another pipeline with different characteristics.- Parameters:
storage_path (Path | None) – [optional ] If set,
Cacherwill store batches in storage_path, otherwise it will create a random temporary directory.
- as_producer()[source]#
Get a
CachedProducerwhich loads the batches stored by thisCacher.- Raises:
DNIKitException – if called before caching has been completed.
- Return type:
- static clear(storage_path=None)[source]#
Clears files produced by
CacherandCachedProducer.- Parameters:
storage_path (Path | None) – if
None(default case), function will clear all dnikit caches under a system’s temporary directory. Otherwise it will clear all dnikit caches under the specified directory.- Raises:
NotADirectoryError – if
storage_pathis not a valid directory.- Return type:
None
Warning
Make sure to only call this function once pipelines are no longer needed (or before pipelines are used at all). Otherwise, a cache that is already in use may be destroyed!
- class dnikit.processors.Composer(filter)[source]#
Bases:
PipelineStageApply a filter function to all
batches, e.g. composing filter(b).- Parameters:
filter (Callable[[Batch], Batch | None]) – The filter function to apply to every
batchin thepipeline. Thefiltershould take a singleBatchas input and return a transformedbatch(e.g. a subset) orNone(to produce an emptybatch).
- classmethod from_dict_metadata(metadata_key, label_dimension, label)[source]#
Initialize a
Composerto filterBatchesby restrictions on theirmetadata, as accessed with aDictMetaKey, e.g.,Batch.StdKeys.LABELS.- Parameters:
metadata_key (DictMetaKey[str]) –
DictMetaKeyto look for in batch’selementslabel_dimension (str) – label dimension in batch’s
metadata_keymetadatato filter bylabel (str) – label value to filter by, for batch
metadata'smetadata_keyandlabel_dimension
- Returns:
- Return type:
- classmethod from_element_filter(elem_filter)[source]#
Initialize a
Composerthat filters batch data based on element-wise filter criteria- Parameters:
elem_filter (Callable[[ElementType], bool]) –
Batch.element-wise validation fnc. ReturnsTrueif valid elseFalse- Returns:
Composerthat filters batches to only elements that meet filter criteria- Return type:
- class dnikit.processors.Concatenator(dim, output_field, fields)[source]#
Bases:
PipelineStageThis
PipelineStagewill concatenate 2 or morefieldsin theBatchand produce a new field with the givenoutput_field.Example
If there were fields
MandNwith dimensionsBxM1xZandBxN1xZand they were concatenated along dimension 1, the result will have a new field of sizeBx(M1+N1)xZ.- Parameters:
- class dnikit.processors.FieldRemover(*, fields, keep=False)[source]#
Bases:
PipelineStageA
PipelineStagethat removes somefieldsfrom aBatch.
- class dnikit.processors.FieldRenamer(mapping)[source]#
Bases:
PipelineStageA
PipelineStagethat renames somefieldsfrom aBatch.
- class dnikit.processors.Flattener(order='C', fields=None)[source]#
Bases:
ProcessorA
Processorthat collapses array of shapeBxN1xN2x..intoBxN- Parameters:
order (str) –
[optional] {
C,F,A,K}:C(default) means to flatten in row-major (C-style) order.Fmeans to flatten in column-major (Fortran-style) order.Ameans to flatten in column-major order if it is Fortran contiguous in memory, row-major order otherwise.Kmeans to flatten in the order the elements occur in memory.fields (None | str | Collection[str]) – [optional] a single
fieldname, or an iterable offieldnames, to be resized. If thefieldsparam isNone, then all thefieldsin thebatchwill be resized.
- Raises:
ValueError – if
orderparam is not one of {C,F,A,K}
- class dnikit.processors.ImageGammaContrastProcessor(gamma=1.0, *, fields=None)[source]#
Bases:
ProcessorProcessorthat gamma corrects images in a data field from aBatch.BxCHWandBxHWCimages accepted with non-normalized values (between 0 and 255). Image (I) is contrasting using formula(I/255)^gamma*255.- Parameters:
- Raises:
DNIKitException – if OpenCV is not installed.
- class dnikit.processors.ImageGaussianBlurProcessor(sigma=0.0, *, fields=None)[source]#
Bases:
ProcessorProcessorthat blurs images in a data field from aBatch.BxCHWandBxHWCimages accepted with non-normalized values (between 0 and 255).- Parameters:
sigma (float) – [optional] blur filter size; recommended values between 0 and 3, but values beyond this range are acceptable.
fields (None | str | Collection[str]) – [keyword arg, optional] a single
fieldname, or an iterable offieldnames, to be processed. Iffieldsparam isNone, then allfieldswill be processed.
- Raises:
if OpenCV is not installed.
ValueError – if
sigmais not positive
- class dnikit.processors.ImageResizer(*, pixel_format, size, fields=None)[source]#
Bases:
ProcessorInitialize an
ImageResizer. This uses OpenCV to resize images. This can convert responses with the structureBxHxWxC(seeImageFormatfor alternatives) to a newHxWvalue. This does not honor aspect ratio – the new image will be exactly the size given. This uses the default OpenCV interpolation,INTER_LINEAR.- Parameters:
pixel_format (ImageFormat) – [keyword arg] the layout of the pixel data, see
ImageFormatsize (Tuple[int, int]) – [keyword arg] the size to scale to,
(width, height)fields (None | str | Collection[str]) – [keyword arg, optional] a single
fieldname, or an iterable offieldnames, to be processed. Iffieldsparam isNone, then allfieldswill be resized.
- Raises:
if OpenCV is not installed.
ValueError – if
sizeelements ((width, height)) are not positive
- class dnikit.processors.ImageRotationProcessor(angle=0.0, pixel_format=ImageFormat.HWC, *, cval=(0, 0, 0), fields=None)[source]#
Bases:
ProcessorProcessorthat performs image rotation along y-axis on data in a data field from aBatch.BxCHWandBxHWCimages accepted with non-normalized values (between 0 and 255).- Parameters:
angle (float) – [optional] angle (in degrees) of image rotation; positive values mean counter-clockwise rotation
pixel_format (ImageFormat) – [optional] the layout of the pixel data, see
ImageFormatcval (Tuple[int, int, int]) – [keyword arg, optional] RGB color value to fill areas outside image; defaults to
(0, 0, 0)(black)fields (None | str | Collection[str]) – [keyword arg, optional] a single
fieldname, or an iterable offieldnames, to be processed. Iffieldsparam isNone, then allfieldswill be processed.
- Raises:
if OpenCV is not installed.
- class dnikit.processors.MeanStdNormalizer(*, mean, std, fields=None)[source]#
Bases:
ProcessorA
Processorthat standardizes afieldof aBatchby subtracting the mean and adjusting the standard deviation to 1.More precisely, if
xis the data to be processed, the following processing is applied:(x - mean) / std.- Parameters:
mean (float) – [keyword arg] The mean to be applied
std (float) – [keyword arg] The standard deviation to be applied
fields (None | str | Collection[str]) – [keyword arg, optional] a single
fieldname, or an iterable offieldnames, to be processed. Iffieldsparam isNone, then allfieldswill be processed.
- class dnikit.processors.MetadataRemover(*, meta_keys=None, keys=None, keep=False)[source]#
Bases:
PipelineStageA
PipelineStagethat removes somemetadatafrom aBatch.- Parameters:
meta_keys (None | MetaKey | DictMetaKey | Collection[MetaKey | DictMetaKey]) – [keyword arg, optional] either a single instance or an iterable of
Batch.MetaKey/Batch.DictMetaKeythat may be removed. IfNone(the default case), this processor will operate on allmetadatakeys.keys (Any) – [keyword arg, optional] key within metadata to be removed.
metadatawith metadata key typeBatch.DictMetaKeyis a mapping fromstr: data-type. This argument specifies the strkey-fieldthat will be removed from the batch’s metadata, where the metadata must have metadata key typeBatch.DictMetaKey. IfNone(the default case), this processor will operate on allkey-fieldsfor metadata with typeBatch.DictMetaKeymetadata key.keep (bool) – [keyword arg, optional] if True, the selected
meta_keysandkeysnow specify what to keep, and all other data will be removed.
- class dnikit.processors.MetadataRenamer(mapping, *, meta_keys=None)[source]#
Bases:
PipelineStageA
PipelineStagethat renames somemetadatafields in aBatch. This only works with metadata that has key typeBatch.DictMetaKey.- Parameters:
mapping (Mapping[str, str]) – a dictionary (or similar) whose keys are the old metadata field names and values are the new metadata field names.
meta_keys (None | DictMetaKey | Collection[DictMetaKey]) – [keyword arg, optional] either a single instance or an iterable of metadata keys of type
Batch.DictMetaKeywhosekey-fieldswill be renamed. IfNone(the default case), allkey-fieldsfor all metadata keys will be renamed.
Note
MetadataRenameronly works with class:Batch.DictMetaKey <dnikit.base.Batch.DictMetaKey> (which has entries that can be renamed).
- class dnikit.processors.PipelineDebugger(label='', first_only=True, dump_fields=False, fields=None)[source]#
Bases:
PipelineStageA
PipelineStagethat can be used to inspectbatchesin apipeline.- Parameters:
first_only (bool) – [optional] see
first_onlydump_fields (bool) – [optional] see
dump_fieldsfields (None | str | Collection[str]) – [optional] see
fields
- static dump(batch, label='', dump_fields=False, fields=None)[source]#
Utility method to produce a dump of a
Batchor aBatch.Builder.- Parameters:
batch (Batch | Builder) –
BatchorBatch.Builderto dumpdump_fields (bool) – [optional] see
dump_fieldsfields (None | str | Collection[str]) – [optional] see
fields
- Return type:
- fields: None | str | Collection[str] = None#
List of fields of interest. Default is None which means all. See
dump_fields
- class dnikit.processors.Pooler(*, dim, method, fields=None)[source]#
Bases:
ProcessorA
Processorthat pools the axes of a data field from aBatchwith a specific method.- Parameters:
dim (int | Collection[int]) – [keyword arg] The dimension (one or many) to be pooled. E.g., Spatial pooling is generally
(1, 2).method (Method) – [keyword arg] Pooling method. See
Pooler.Methodfor full list of options.fields (None | str | Collection[str]) – [keyword arg, optional] a single
fieldname, or an iterable offieldnames, to be pooled. If thefieldsparam isNone, then all thefieldsin thebatchwill be pooled.
- class dnikit.processors.SnapshotRemover(snapshots=None, keep=False)[source]#
Bases:
PipelineStageA
PipelineStagethat removes snapshots from aBatch. If used with no arguments, this will remove allsnapshots.- Parameters:
- class dnikit.processors.SnapshotSaver(save='snapshot', fields=None, keep=True)[source]#
Bases:
PipelineStageA
PipelineStagethat attaches the currentBatchas thesnapshot.- Parameters:
- class dnikit.processors.Transposer(*, dim, fields=None)[source]#
Bases:
ProcessorA
Processorthat transposes dimensions in a datafieldfrom aBatch. This processor will reorder the dimensions of the data as specified in thedimparam.Example
To reorder
NCHWtoNHWC(or vice versa), specifyTransposer(dim=[0,3,1,2])- Parameters:
dim (Sequence[int]) – [keyword arg] the new order of the dimensions. It is illegal to reorder the 0th dimension.
fields (None | str | Collection[str]) – [keyword arg, optional] a single
fieldname, or an iterable offieldnames, to be transposed. Iffieldsparam isNone, then allfieldswill be transposed.
See also
- Raises:
ValueError – if input specifies reordering the 0th dimension
- Parameters:
fields (None | str | Collection[str]) –