Query data out of Rerun

Rerun comes with the ability to get data out of Rerun from code. This page provides an overview of the API, as well as recipes to load the data in popular packages such as Pandas, Polars, and DuckDB.

Starting a server with recordings starting-a-server-with-recordings

The first step to query data is to start a server and load it with a dataset containing your recording.

import rerun as rr

# Start a server with one or more .rrd files
with rr.server.Server(datasets={"my_dataset": ["recording.rrd"]}) as server:
    client = server.client()
    dataset = client.get_dataset("my_dataset")

The server can host multiple datasets. Each dataset maps to either a list of .rrd files or a directory (which will be scanned for .rrd files):

with rr.server.Server(datasets={
    # Explicit list of RRD files
    "dataset1": ["recording1.rrd", "recording2.rrd"],
    # Directory containing RRD files
    "dataset2": "/path/to/recordings_dir",
}) as server:
    client = server.client()
    # Access each dataset by name
    ds1 = client.get_dataset("dataset1")
    ds2 = client.get_dataset("dataset2")

When multiple recordings are loaded into a dataset, each gets mapped to a separate segment whose ID is the corresponding recording ID.

Inspecting the schema inspecting-the-schema

The content of a dataset can be inspected using the schema() method:

schema = dataset.schema()
schema.index_columns()        # list of all index columns (timelines)
schema.component_columns()    # list of all component columns

Querying a dataset using `reader` querying-a-dataset-using-reader

The primary means of querying data is the reader() method. In its simplest form, it is used as follows:

df = dataset.reader(index="frame_nr")

print(df)

The returned object is a datafusion.DataFrame. Rerun's query APIs heavily rely on DataFusion, which offers a rich set of data filtering, manipulation, and conversion tools.

When calling reader(), an index column must be specified. It can be any of the recording's timelines. Each row of the view will correspond to a unique value of the index column. It is also possible to query the dataset using index=None. In this case, only the static=True data will be returned.

By default, when performing a query on a dataset, data for all its segments is returned. An additional "rerun_segment_id" column is added to the dataframe to indicate which segment each row belongs to.

An often used parameter of the reader() method is fill_latest_at=True. When used, all null data will be filled with a latest-at value, similarly to how the viewer works.

Querying a subset of a dataset querying-a-subset-of-a-dataset

In general, datasets can be arbitrarily large, and it is often useful to query only a subset of it. This is achieved using DatasetView objects:

# Filter by entity paths
dataset_view = dataset.filter_contents(["/world/robot/**", "/sensors/**"])

# Filter by segment IDs (recording IDs)
dataset_view = dataset.filter_segments(["recording_001", "recording_002"])

# Chain filters
dataset_view = dataset.filter_contents(["/world/**"]).filter_segments(["recording_001"])

DatasetView instances have the exact same reader() method as the original dataset:

df = dataset_view.reader(index="frame_nr")

print(df)

Filtering with DataFusion filtering-with-datafusion

DataFusion offers a rich set of filtering, projection, and joining capabilities. Check the DataFusion Python documentation for details.

For illustration, here are a few simple examples:

from datafusion import col

df = dataset.reader(index="frame_nr")

# Filter by index range
df = df.filter(col("frame_nr") >= 0).filter(col("frame_nr") <= 100)

# Filter by column not null
df = df.filter(col("/world/robot:Position3D:positions").is_not_null())

# Select specific columns
df = df.select("frame_nr", "/world/robot:Position3D:positions")

Converting to other formats converting-to-other-formats

Likewise, DataFusion offers a rich set of tools to convert a dataframe to various formats.

Load data to a PyArrow `Table` load-data-to-a-pyarrow-table

import rerun as rr

with rr.server.Server(datasets={"my_dataset": ["recording.rrd"]}) as server:
    dataset = server.client().get_dataset("my_dataset")
    table = dataset.reader(index="frame_nr").to_arrow_table()

Load data to a Pandas dataframe load-data-to-a-pandas-dataframe

import rerun as rr

with rr.server.Server(datasets={"my_dataset": ["recording.rrd"]}) as server:
    dataset = server.client().get_dataset("my_dataset")
    df = dataset.reader(index="frame_nr").to_pandas()

Load data to a Polars dataframe load-data-to-a-polars-dataframe

import rerun as rr
import polars as pl

with rr.server.Server(datasets={"my_dataset": ["recording.rrd"]}) as server:
    dataset = server.client().get_dataset("my_dataset")
    df = pl.from_arrow(dataset.reader(index="frame_nr").to_arrow_table())

Load data to a DuckDB relation load-data-to-a-duckdb-relation

import rerun as rr
import duckdb

with rr.server.Server(datasets={"my_dataset": ["recording.rrd"]}) as server:
    dataset = server.client().get_dataset("my_dataset")
    table = dataset.reader(index="frame_nr").to_arrow_table()
    rel = duckdb.arrow(table)

Rerun