turicreate.SFrame.pack_columns

SFrame.pack_columns(column_names=None, column_name_prefix=None, dtype=<class 'list'>, fill_na=None, remove_prefix=True, new_column_name=None)

Pack columns of the current SFrame into one single column. The result is a new SFrame with the unaffected columns from the original SFrame plus the newly created column.

The list of columns that are packed is chosen through either the column_names or column_name_prefix parameter. Only one of the parameters is allowed to be provided. columns_names explicitly specifies the list of columns to pack, while column_name_prefix specifies that all columns that have the given prefix are to be packed.

The type of the resulting column is decided by the dtype parameter. Allowed values for dtype are dict, array.array and list:

  • dict: pack to a dictionary SArray where column name becomes dictionary key and column value becomes dictionary value
  • array.array: pack all values from the packing columns into an array
  • list: pack all values from the packing columns into a list.
Parameters:
column_names : list[str], optional

A list of column names to be packed. If omitted and column_name_prefix is not specified, all columns from current SFrame are packed. This parameter is mutually exclusive with the column_name_prefix parameter.

column_name_prefix : str, optional

Pack all columns with the given column_name_prefix. This parameter is mutually exclusive with the columns_names parameter.

dtype : dict | array.array | list, optional

The resulting packed column type. If not provided, dtype is list.

fill_na : value, optional

Value to fill into packed column if missing value is encountered. If packing to dictionary, fill_na is only applicable to dictionary values; missing keys are not replaced.

remove_prefix : bool, optional

If True and column_name_prefix is specified, the dictionary key will be constructed by removing the prefix from the column name. This option is only applicable when packing to dict type.

new_column_name : str, optional

Packed column name. If not given and column_name_prefix is given, then the prefix will be used as the new column name, otherwise name is generated automatically.

Returns:
out : SFrame

An SFrame that contains columns that are not packed, plus the newly packed column.

See also

unpack

Notes

  • If packing to dictionary, missing key is always dropped. Missing values are dropped if fill_na is not provided, otherwise, missing value is replaced by ‘fill_na’. If packing to list or array, missing values will be kept. If ‘fill_na’ is provided, the missing value is replaced with ‘fill_na’ value.

Examples

Suppose ‘sf’ is an an SFrame that maintains business category information:

>>> sf = turicreate.SFrame({'business': range(1, 5),
...                       'category.retail': [1, None, 1, None],
...                       'category.food': [1, 1, None, None],
...                       'category.service': [None, 1, 1, None],
...                       'category.shop': [1, 1, None, 1]})
>>> sf
+----------+-----------------+---------------+------------------+---------------+
| business | category.retail | category.food | category.service | category.shop |
+----------+-----------------+---------------+------------------+---------------+
|    1     |        1        |       1       |       None       |       1       |
|    2     |       None      |       1       |        1         |       1       |
|    3     |        1        |      None     |        1         |      None     |
|    4     |       None      |       1       |       None       |       1       |
+----------+-----------------+---------------+------------------+---------------+
[4 rows x 5 columns]

To pack all category columns into a list:

>>> sf.pack_columns(column_name_prefix='category')
+----------+-----------------------+
| business |        category       |
+----------+-----------------------+
|    1     |    [1, 1, None, 1]    |
|    2     |    [1, None, 1, 1]    |
|    3     |   [None, 1, 1, None]  |
|    4     | [None, None, None, 1] |
+----------+-----------------------+
[4 rows x 2 columns]

To pack all category columns into a dictionary, with new column name:

>>> sf.pack_columns(column_name_prefix='category', dtype=dict,
...                 new_column_name='new name')
+----------+-------------------------------+
| business |            new name           |
+----------+-------------------------------+
|    1     | {'food': 1, 'shop': 1, 're... |
|    2     | {'food': 1, 'shop': 1, 'se... |
|    3     |  {'retail': 1, 'service': 1}  |
|    4     |          {'shop': 1}          |
+----------+-------------------------------+
[4 rows x 2 columns]

To keep column prefix in the resulting dict key:

>>> sf.pack_columns(column_name_prefix='category', dtype=dict,
                    remove_prefix=False)
+----------+-------------------------------+
| business |            category           |
+----------+-------------------------------+
|    1     | {'category.retail': 1, 'ca... |
|    2     | {'category.food': 1, 'cate... |
|    3     | {'category.retail': 1, 'ca... |
|    4     |      {'category.shop': 1}     |
+----------+-------------------------------+
[4 rows x 2 columns]

To explicitly pack a set of columns:

>>> sf.pack_columns(column_names = ['business', 'category.retail',
                               'category.food', 'category.service',
                               'category.shop'])
+-----------------------+
|           X1          |
+-----------------------+
|   [1, 1, 1, None, 1]  |
|   [2, None, 1, 1, 1]  |
| [3, 1, None, 1, None] |
| [4, None, 1, None, 1] |
+-----------------------+
[4 rows x 1 columns]

To pack all columns with name starting with ‘category’ into an array type, and with missing value replaced with 0:

>>> import array
>>> sf.pack_columns(column_name_prefix="category", dtype=array.array,
...                 fill_na=0)
+----------+----------------------+
| business |       category       |
+----------+----------------------+
|    1     | [1.0, 1.0, 0.0, 1.0] |
|    2     | [1.0, 0.0, 1.0, 1.0] |
|    3     | [0.0, 1.0, 1.0, 0.0] |
|    4     | [0.0, 0.0, 0.0, 1.0] |
+----------+----------------------+
[4 rows x 2 columns]