turicreate.SFrame.join

SFrame.join(right, on=None, how='inner', alter_name=None)

Merge two SFrames. Merges the current (left) SFrame with the given (right) SFrame using a SQL-style equi-join operation by columns.

Parameters:
right : SFrame

The SFrame to join.

on : None | str | list | dict, optional

The column name(s) representing the set of join keys. Each row that has the same value in this set of columns will be merged together.

  • If ‘None’ is given, join will use all columns that have the same name as the set of join keys.
  • If a str is given, this is interpreted as a join using one column, where both SFrames have the same column name.
  • If a list is given, this is interpreted as a join using one or more column names, where each column name given exists in both SFrames.
  • If a dict is given, each dict key is taken as a column name in the left SFrame, and each dict value is taken as the column name in right SFrame that will be joined together. e.g. {‘left_col_name’:’right_col_name’}.
how : {‘left’, ‘right’, ‘outer’, ‘inner’}, optional

The type of join to perform. ‘inner’ is default.

  • inner: Equivalent to a SQL inner join. Result consists of the rows from the two frames whose join key values match exactly, merged together into one SFrame.
  • left: Equivalent to a SQL left outer join. Result is the union between the result of an inner join and the rest of the rows from the left SFrame, merged with missing values.
  • right: Equivalent to a SQL right outer join. Result is the union between the result of an inner join and the rest of the rows from the right SFrame, merged with missing values.
  • outer: Equivalent to a SQL full outer join. Result is the union between the result of a left outer join and a right outer join.
alter_name : None | dict

user provided names to resolve column name conflict when merging two sframe.

  • ‘None’, then default conflict resolution will be used. For example, if ‘X’ is

defined in the sframe on the left side of join, and there’s an column also called ‘X’ in the sframe on the right, ‘X.1’ will be used as the new column name when appending the column ‘X’ from the right sframe, in order to avoid column name collision.

  • if a dict is given, the dict key should be obtained from column names from the right

sframe. The dict value should be user preferred column name to resolve the name collision instead of resolving by the default behavior. In general, dict key should not be any value from the right sframe column names. If dict value will cause potential name confict after an attempt to resolve, exception will be thrown.

Returns:
out : SFrame

Examples

>>> animals = turicreate.SFrame({'id': [1, 2, 3, 4],
...                           'name': ['dog', 'cat', 'sheep', 'cow']})
>>> sounds = turicreate.SFrame({'id': [1, 3, 4, 5],
...                          'sound': ['woof', 'baa', 'moo', 'oink']})
>>> animals.join(sounds, how='inner')
+----+-------+-------+
| id |  name | sound |
+----+-------+-------+
| 1  |  dog  |  woof |
| 3  | sheep |  baa  |
| 4  |  cow  |  moo  |
+----+-------+-------+
[3 rows x 3 columns]
>>> animals.join(sounds, on='id', how='left')
+----+-------+-------+
| id |  name | sound |
+----+-------+-------+
| 1  |  dog  |  woof |
| 3  | sheep |  baa  |
| 4  |  cow  |  moo  |
| 2  |  cat  |  None |
+----+-------+-------+
[4 rows x 3 columns]
>>> animals.join(sounds, on=['id'], how='right')
+----+-------+-------+
| id |  name | sound |
+----+-------+-------+
| 1  |  dog  |  woof |
| 3  | sheep |  baa  |
| 4  |  cow  |  moo  |
| 5  |  None |  oink |
+----+-------+-------+
[4 rows x 3 columns]
>>> animals.join(sounds, on={'id':'id'}, how='outer')
+----+-------+-------+
| id |  name | sound |
+----+-------+-------+
| 1  |  dog  |  woof |
| 3  | sheep |  baa  |
| 4  |  cow  |  moo  |
| 5  |  None |  oink |
| 2  |  cat  |  None |
+----+-------+-------+
[5 rows x 3 columns]