turicreate.SFrame.unpack

SFrame.unpack(column_name=None, column_name_prefix=None, column_types=None, na_value=None, limit=None)

Expand one column of this SFrame to multiple columns with each value in a separate column. Returns a new SFrame with the unpacked column replaced with a list of new columns. The column must be of list/array/dict type.

For more details regarding name generation, missing value handling and other, refer to the SArray version of unpack().

Parameters:
column_name : str, optional

Name of the unpacked column, if provided. If not provided and only one column is present then the column is unpacked. In case of multiple columns, name must be provided to know which column to be unpacked.

column_name_prefix : str, optional

If provided, unpacked column names would start with the given prefix. If not provided, default value is the name of the unpacked column.

column_types : [type], optional

Column types for the unpacked columns. If not provided, column types are automatically inferred from first 100 rows. For array type, default column types are float. If provided, column_types also restricts how many columns to unpack.

na_value : flexible_type, optional

If provided, convert all values that are equal to “na_value” to missing value (None).

limit : list[str] | list[int], optional

Control unpacking only a subset of list/array/dict value. For dictionary SArray, limit is a list of dictionary keys to restrict. For list/array SArray, limit is a list of integers that are indexes into the list/array value.

Returns:
out : SFrame

A new SFrame that contains rest of columns from original SFrame with the given column replaced with a collection of unpacked columns.

Examples

>>> sf = turicreate.SFrame({'id': [1,2,3],
...                      'wc': [{'a': 1}, {'b': 2}, {'a': 1, 'b': 2}]})
+----+------------------+
| id |        wc        |
+----+------------------+
| 1  |     {'a': 1}     |
| 2  |     {'b': 2}     |
| 3  | {'a': 1, 'b': 2} |
+----+------------------+
[3 rows x 2 columns]
>>> sf.unpack('wc')
+----+------+------+
| id | wc.a | wc.b |
+----+------+------+
| 1  |  1   | None |
| 2  | None |  2   |
| 3  |  1   |  2   |
+----+------+------+
[3 rows x 3 columns]

To not have prefix in the generated column name:

>>> sf.unpack('wc', column_name_prefix="")
+----+------+------+
| id |  a   |  b   |
+----+------+------+
| 1  |  1   | None |
| 2  | None |  2   |
| 3  |  1   |  2   |
+----+------+------+
[3 rows x 3 columns]

To limit subset of keys to unpack:

>>> sf.unpack('wc', limit=['b'])
+----+------+
| id | wc.b |
+----+------+
| 1  | None |
| 2  |  2   |
| 3  |  2   |
+----+------+
[3 rows x 3 columns]

To unpack an array column:

>>> import array
>>> sf = turicreate.SFrame({'id': [1,2,3],
...                       'friends': [array.array('d', [1.0, 2.0, 3.0]),
...                                   array.array('d', [2.0, 3.0, 4.0]),
...                                   array.array('d', [3.0, 4.0, 5.0])]})
>>> sf
+-----------------+----+
|     friends     | id |
+-----------------+----+
| [1.0, 2.0, 3.0] | 1  |
| [2.0, 3.0, 4.0] | 2  |
| [3.0, 4.0, 5.0] | 3  |
+-----------------+----+
[3 rows x 2 columns]
>>> sf.unpack('friends')
+----+-----------+-----------+-----------+
| id | friends.0 | friends.1 | friends.2 |
+----+-----------+-----------+-----------+
| 1  |    1.0    |    2.0    |    3.0    |
| 2  |    2.0    |    3.0    |    4.0    |
| 3  |    3.0    |    4.0    |    5.0    |
+----+-----------+-----------+-----------+
[3 rows x 4 columns]
>>> sf = turicreate.SFrame([{'a':1,'b':2,'c':3},{'a':4,'b':5,'c':6}])
>>> sf.unpack()
+---+---+---+
| a | b | c |
+---+---+---+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+---+---+---+
[2 rows x 3 columns]