turicreate.SFrame.pack_columns¶
-
SFrame.pack_columns(column_names=None, column_name_prefix=None, dtype=<class 'list'>, fill_na=None, remove_prefix=True, new_column_name=None)¶ Pack columns of the current SFrame into one single column. The result is a new SFrame with the unaffected columns from the original SFrame plus the newly created column.
The list of columns that are packed is chosen through either the
column_namesorcolumn_name_prefixparameter. Only one of the parameters is allowed to be provided.columns_namesexplicitly specifies the list of columns to pack, whilecolumn_name_prefixspecifies that all columns that have the given prefix are to be packed.The type of the resulting column is decided by the
dtypeparameter. Allowed values fordtypeare dict, array.array and list:- dict: pack to a dictionary SArray where column name becomes dictionary key and column value becomes dictionary value
- array.array: pack all values from the packing columns into an array
- list: pack all values from the packing columns into a list.
Parameters: - column_names : list[str], optional
A list of column names to be packed. If omitted and column_name_prefix is not specified, all columns from current SFrame are packed. This parameter is mutually exclusive with the column_name_prefix parameter.
- column_name_prefix : str, optional
Pack all columns with the given column_name_prefix. This parameter is mutually exclusive with the columns_names parameter.
- dtype : dict | array.array | list, optional
The resulting packed column type. If not provided, dtype is list.
- fill_na : value, optional
Value to fill into packed column if missing value is encountered. If packing to dictionary, fill_na is only applicable to dictionary values; missing keys are not replaced.
- remove_prefix : bool, optional
If True and column_name_prefix is specified, the dictionary key will be constructed by removing the prefix from the column name. This option is only applicable when packing to dict type.
- new_column_name : str, optional
Packed column name. If not given and column_name_prefix is given, then the prefix will be used as the new column name, otherwise name is generated automatically.
Returns: - out : SFrame
An SFrame that contains columns that are not packed, plus the newly packed column.
See also
Notes
- If packing to dictionary, missing key is always dropped. Missing values are dropped if fill_na is not provided, otherwise, missing value is replaced by ‘fill_na’. If packing to list or array, missing values will be kept. If ‘fill_na’ is provided, the missing value is replaced with ‘fill_na’ value.
Examples
Suppose ‘sf’ is an an SFrame that maintains business category information:
>>> sf = turicreate.SFrame({'business': range(1, 5), ... 'category.retail': [1, None, 1, None], ... 'category.food': [1, 1, None, None], ... 'category.service': [None, 1, 1, None], ... 'category.shop': [1, 1, None, 1]}) >>> sf +----------+-----------------+---------------+------------------+---------------+ | business | category.retail | category.food | category.service | category.shop | +----------+-----------------+---------------+------------------+---------------+ | 1 | 1 | 1 | None | 1 | | 2 | None | 1 | 1 | 1 | | 3 | 1 | None | 1 | None | | 4 | None | 1 | None | 1 | +----------+-----------------+---------------+------------------+---------------+ [4 rows x 5 columns]
To pack all category columns into a list:
>>> sf.pack_columns(column_name_prefix='category') +----------+-----------------------+ | business | category | +----------+-----------------------+ | 1 | [1, 1, None, 1] | | 2 | [1, None, 1, 1] | | 3 | [None, 1, 1, None] | | 4 | [None, None, None, 1] | +----------+-----------------------+ [4 rows x 2 columns]
To pack all category columns into a dictionary, with new column name:
>>> sf.pack_columns(column_name_prefix='category', dtype=dict, ... new_column_name='new name') +----------+-------------------------------+ | business | new name | +----------+-------------------------------+ | 1 | {'food': 1, 'shop': 1, 're... | | 2 | {'food': 1, 'shop': 1, 'se... | | 3 | {'retail': 1, 'service': 1} | | 4 | {'shop': 1} | +----------+-------------------------------+ [4 rows x 2 columns]
To keep column prefix in the resulting dict key:
>>> sf.pack_columns(column_name_prefix='category', dtype=dict, remove_prefix=False) +----------+-------------------------------+ | business | category | +----------+-------------------------------+ | 1 | {'category.retail': 1, 'ca... | | 2 | {'category.food': 1, 'cate... | | 3 | {'category.retail': 1, 'ca... | | 4 | {'category.shop': 1} | +----------+-------------------------------+ [4 rows x 2 columns]
To explicitly pack a set of columns:
>>> sf.pack_columns(column_names = ['business', 'category.retail', 'category.food', 'category.service', 'category.shop']) +-----------------------+ | X1 | +-----------------------+ | [1, 1, 1, None, 1] | | [2, None, 1, 1, 1] | | [3, 1, None, 1, None] | | [4, None, 1, None, 1] | +-----------------------+ [4 rows x 1 columns]
To pack all columns with name starting with ‘category’ into an array type, and with missing value replaced with 0:
>>> import array >>> sf.pack_columns(column_name_prefix="category", dtype=array.array, ... fill_na=0) +----------+----------------------+ | business | category | +----------+----------------------+ | 1 | [1.0, 1.0, 0.0, 1.0] | | 2 | [1.0, 0.0, 1.0, 1.0] | | 3 | [0.0, 1.0, 1.0, 0.0] | | 4 | [0.0, 0.0, 0.0, 1.0] | +----------+----------------------+ [4 rows x 2 columns]