data.datasets.multi_modal_img_text package
Subpackages
- data.datasets.multi_modal_img_text.zero_shot package
Submodules
data.datasets.multi_modal_img_text.base_multi_modal_img_text module
- class data.datasets.multi_modal_img_text.base_multi_modal_img_text.BaseMultiModalImgText(opts, *args, **kwargs)[source]
Bases:
BaseImageDataset
Base class for Image-Text multi-modal learning
- Parameters:
opts – command-line arguments
- get_zero_shot_dataset(*args, **kwargs) BaseZeroShotDataset | None [source]
If zero-shot evaluation is enabled, zero-shot dataset is returned. Otherwise, None is returned
- get_dataset(*args, **kwargs) Any [source]
Helper function to get the dataset. Child classes must override this function
Returns the number of classes in detection dataset along with super-class arguments.
- classmethod add_arguments(parser: ArgumentParser) ArgumentParser [source]
Add dataset-specific arguments to the parser.
- get_zero_shot_pair(img_index: int) Tuple[Image, str | List[str] | List[List[str]], int] [source]
Get image-text pair for zero-shot dataset along with classification label.
- Parameters:
img_index – Image index
- Returns:
A tuple of PIL image, captions, and class label
- data.datasets.multi_modal_img_text.base_multi_modal_img_text.multi_modal_img_text_collate_fn(batch: List[Mapping[str, Tensor | Mapping[str, Tensor]]], opts: Namespace) Mapping[str, Tensor | Mapping[str, Tensor]] [source]
Combines a list of dictionaries into a single dictionary by concatenating matching fields.
data.datasets.multi_modal_img_text.flickr module
- class data.datasets.multi_modal_img_text.flickr.FlickrDataset(opts, *args, **kwargs)[source]
Bases:
BaseMultiModalImgText
Dataset loader for Flickr-30k and Flickr-8k datasets.
- For more info see:
http://hockenmaier.cs.illinois.edu/8k-pictures.html https://shannon.cs.illinois.edu/DenotationGraph/
- Splits: train, val, and test
Also known in literature as Karpathy splits https://cs.stanford.edu/people/karpathy/deepimagesent/
- Tracking license info:
Captions have CC BY 3.0 license (see links above). Splits are under BSD License (see Github of NeuralTalk by Karpathy et. al.). Images are from Flickr. We do not own them and are only used for research purposes.
- Parameters:
opts – command-line arguments
is_training (Optional[bool]) – A flag used to indicate training or validation mode. Default: True
is_evaluation (Optional[bool]) – A flag used to indicate evaluation (or inference) mode. Default: False
- get_dataset(*args, **kwargs) None [source]
The data under self.root is expected to consist of:
dataset.json # Karpathy splits + captions images/ # Raw images
- The metdatadata cap be downloaded from:
https://cs.stanford.edu/people/karpathy/deepimagesent/flickr30k.zip
- Images can be obtained from:
Flickr-8k: http://hockenmaier.cs.illinois.edu/8k-pictures.html Flickr-30k: https://shannon.cs.illinois.edu/DenotationGraph/
data.datasets.multi_modal_img_text.img_text_tar_dataset module
- data.datasets.multi_modal_img_text.img_text_tar_dataset.extract_content(tar_file: TarFile, file_name: str) AnyStr [source]
Extract the context of a particular file inside a tar file and returns it.
- data.datasets.multi_modal_img_text.img_text_tar_dataset.decode_image(byte_data) Image [source]
Reads the byte image data and returns the PIL image.
- data.datasets.multi_modal_img_text.img_text_tar_dataset.decode_text(byte_data) str [source]
Reads the byte text data and returns the decoded string.
- data.datasets.multi_modal_img_text.img_text_tar_dataset.async_download_file_from_s3(opts: Namespace, tar_file_name: str, cache_loc: str, *args, **kwargs) None [source]
Helper function to download the files asynchronously from S3.
- Parameters:
opts – command-line arguments
tar_file_name – Name of the tar file
cache_loc – Caching location on the local machine
- class data.datasets.multi_modal_img_text.img_text_tar_dataset.ImgTextTarDataset(opts, *args, **kwargs)[source]
Bases:
BaseMultiModalImgText
ImgTextTarDataset class for datasets that store Image-Text pairs as tar files, each tar file with multiple pairs.
The dataset should be stored in following format where img_text_tar_dataset is the location of directory that has all tar files.
img_text_tar_dataset |— 00000000_0_1000.tar.gz |——– 00000000_0_image |——– 00000000_0_text |——– 00000000_1_image |——– 00000000_1_text |——– …
|— 00000000_1000_2000.tar.gz |——– 00000000_1000_image |——– 00000000_1000_text |——– 00000000_1001_image |——– 00000000_1001_text |——– …
- Parameters:
opts – An argparse.Namespace instance.
- get_dataset(*args, **kwargs) Dict[str, str] [source]
Reads the metadata file and returns a mapping of indices of files stored in a tar file and its name