turicreate.SFrame.stack

SFrame.stack(self, column_name, new_column_name=None, drop_na=False, new_column_type=None)

Convert a “wide” column of an SFrame to one or two “tall” columns by stacking all values.

The stack works only for columns of dict, list, or array type. If the column is dict type, two new columns are created as a result of stacking: one column holds the key and another column holds the value. The rest of the columns are repeated for each key/value pair.

If the column is array or list type, one new column is created as a result of stacking. With each row holds one element of the array or list value, and the rest columns from the same original row repeated.

The returned SFrame includes the newly created column(s) and all columns other than the one that is stacked.

Parameters:
column_name : str

The column to stack. This column must be of dict/list/array type

new_column_name : str | list of str, optional

The new column name(s). If original column is list/array type, new_column_name must a string. If original column is dict type, new_column_name must be a list of two strings. If not given, column names are generated automatically.

drop_na : boolean, optional

If True, missing values and empty list/array/dict are all dropped from the resulting column(s). If False, missing values are maintained in stacked column(s).

new_column_type : type | list of types, optional

The new column types. If original column is a list/array type new_column_type must be a single type, or a list of one type. If original column is of dict type, new_column_type must be a list of two types. If not provided, the types are automatically inferred from the first 100 values of the SFrame.

Returns:
out : SFrame

A new SFrame that contains newly stacked column(s) plus columns in original SFrame other than the stacked column.

See also

unstack

Examples

Suppose ‘sf’ is an SFrame that contains a column of dict type:

>>> sf = turicreate.SFrame({'topic':[1,2,3,4],
...                       'words': [{'a':3, 'cat':2},
...                                 {'a':1, 'the':2},
...                                 {'the':1, 'dog':3},
...                                 {}]
...                      })
+-------+----------------------+
| topic |        words         |
+-------+----------------------+
|   1   |  {'a': 3, 'cat': 2}  |
|   2   |  {'a': 1, 'the': 2}  |
|   3   | {'the': 1, 'dog': 3} |
|   4   |          {}          |
+-------+----------------------+
[4 rows x 2 columns]

Stack would stack all keys in one column and all values in another column:

>>> sf.stack('words', new_column_name=['word', 'count'])
+-------+------+-------+
| topic | word | count |
+-------+------+-------+
|   1   |  a   |   3   |
|   1   | cat  |   2   |
|   2   |  a   |   1   |
|   2   | the  |   2   |
|   3   | the  |   1   |
|   3   | dog  |   3   |
|   4   | None |  None |
+-------+------+-------+
[7 rows x 3 columns]

Observe that since topic 4 had no words, an empty row is inserted. To drop that row, set drop_na=True in the parameters to stack.

Suppose ‘sf’ is an SFrame that contains a user and his/her friends, where ‘friends’ columns is an array type. Stack on ‘friends’ column would create a user/friend list for each user/friend pair:

>>> sf = turicreate.SFrame({'topic':[1,2,3],
...                       'friends':[[2,3,4], [5,6],
...                                  [4,5,10,None]]
...                      })
>>> sf
+-------+------------------+
| topic |     friends      |
+-------+------------------+
|  1    |     [2, 3, 4]    |
|  2    |      [5, 6]      |
|  3    | [4, 5, 10, None] |
+----- -+------------------+
[3 rows x 2 columns]
>>> sf.stack('friends', new_column_name='friend')
+-------+--------+
| topic | friend |
+-------+--------+
|   1   |   2    |
|   1   |   3    |
|   1   |   4    |
|   2   |   5    |
|   2   |   6    |
|   3   |   4    |
|   3   |   5    |
|   3   |   10   |
|   3   |  None  |
+-------+--------+
[9 rows x 2 columns]