Python Notebook Widget
The Python package also provides a Python notebook widget to use Embedding Atlas in your notebooks. The widget uses AnyWidget and supports Jupyter, Marimo, Colab, VSCode, and more.
Installation
pip install embedding-atlasExample
from embedding_atlas.widget import EmbeddingAtlasWidget
# Create an Embedding Atlas widget without projection
# This widget will show table and charts only, not the embedding view.
EmbeddingAtlasWidget(df)
# Compute text embedding and projection of the embedding
from embedding_atlas.projection import compute_text_projection
compute_text_projection(df, text="description",
x="projection_x", y="projection_y", neighbors="neighbors"
)
# Create an Embedding Atlas widget with the pre-computed projection
widget = EmbeddingAtlasWidget(df, text="description",
x="projection_x", y="projection_y", neighbors="neighbors"
)
# Display the widget
widgetThe widget embeds the Embedding Atlas UI into your notebook. You can make selections in the widget, and then use:
df = widget.selection()to get the selection back as a data frame.
Reference
from embedding_atlas.widget import EmbeddingAtlasWidgetBelow are the constructor options of the widget:
Create an Embedding Atlas widget.
- Args:
- data_frame:
A DataFrame/Arrow object to "register" with DuckDB.
- row_id:
The column name for row id (if not specified, a row id column will be added).
- x:
The column name for X axis in the embedding.
- y:
The column name for Y axis in the embedding.
- text:
The column name for the textual data.
- neighbors:
The column name containing precomputed K-nearest neighbors for each point. Each value in the column should be a dictionary with the format:
{ "ids": [id1, id2, ...], "distances": [distance1, distance2, ...] }."ids"should be an array of row ids of the neighbors (ifrow_idis specified, match the value in row_id, otherwise use the row index), sorted by distance."distances"should contain the corresponding distances to each neighbor.
- labels:
Labels for the embedding view. Set to string
"automatic"to generate labels automatically, or"disabled"to disable auto labels. Automatic labels are generated by clustering the 2D density distribution and selecting representative keywords using TF-IDF ranking. You can also pass in a list of labels. Each label must containxandycoordinates andtextfor the label content. Optionally, you may specify an integerlevelto roughly control the zoom level where the label appears, and priority for the label's priority. Higher priority labels have a better chance to appear when multiple labels overlap.- stop_words:
Stop words for automatic label generation.
- point_size:
Override the default point size for the embedding view.
- show_table:
Whether to display the data table when the widget opens.
- show_charts:
Whether to display charts when the widget opens.
- show_embedding:
Whether to display the embedding view when the widget opens.
- connection (DuckDBPyConnection, optional):
A DuckDB connection. Defaults to duckdb.connect().