Physical AI AV¶

Warning

Experimental Dataset Support

The Physical AI AV dataset integration is currently under active development and should be considered experimental. Features may be incomplete, APIs may change, and unexpected bugs are possible.

If you encounter any issues, please report them on our GitHub Issues page. Your feedback helps us improve!

The Physical AI AV dataset provides autonomous driving sensor data collected using the NVIDIA Hyperion 8 platform. It includes 7 f-theta (fisheye) cameras at ~30 fps, a 360-degree LiDAR at ~10 Hz, auto-labeled 3D bounding box detections, and high-rate egomotion data (67-100 Hz). The dataset features Draco-compressed LiDAR point clouds with per-point timestamps and dual egomotion sources (real-time and offline-smoothed).

Sensor Overview¶

The following timeplot illustrates the temporal alignment and recording frequency of the available sensors:

Available Modalities¶

Name	Available	Description
Ego Vehicle	✓	State of the ego vehicle including poses, velocity, and acceleration at 67-100 Hz. Two egomotion sources are available: real-time and offline-smoothed. See `EgoStateSE3`.
Map	X	Not available for this dataset.
Bounding Boxes	✓	Auto-labeled 3D bounding box detections with 10 semantic classes and track tokens. See `PhysicalAIAVBoxDetectionLabel` and `BoxDetectionsSE3`.
Traffic Lights	X	Not available for this dataset.
Cameras	✓	Includes 7 f-theta (fisheye) cameras at ~30 fps, see `Camera`: `FTCAM_F0` (front wide, 120 fov) `FTCAM_TELE_F0` (front tele, 30 fov) `FTCAM_R0` (cross right, 120 fov) `FTCAM_L0` (cross left, 120 fov) `FTCAM_R1` (rear right, 70 fov) `FTCAM_L1` (rear left, 70 fov) `FTCAM_TELE_B0` (rear tele, 30 fov)
Lidars	✓	Includes 1 top-mounted 360-degree LiDAR, see `Lidar`: `LIDAR_TOP` (top 360 fov)

Download¶

The dataset can be downloaded from Hugging Face. For additional tools and documentation, see the Physical AI AV devkit.

The downloaded dataset should have the following structure:

$PHYSICAL_AI_AV_DATA_ROOT
├── clip_index.parquet
├── calibration/
│   ├── camera_intrinsics/
│   │   └── camera_intrinsics.chunk_XXXX.parquet
│   └── sensor_extrinsics/
│       └── sensor_extrinsics.chunk_XXXX.parquet
├── labels/
│   ├── egomotion/
│   │   └── {clip_id}.egomotion.parquet
│   ├── egomotion.offline/
│   │   └── {clip_id}.egomotion.offline.parquet
│   └── obstacle.offline/
│       └── {clip_id}.obstacle.offline.parquet
├── lidar/
│   └── lidar_top_360fov/
│       └── {clip_id}.lidar_top_360fov.parquet
└── camera/
    ├── camera_front_wide_120fov/
    ├── camera_front_tele_30fov/
    ├── camera_cross_left_120fov/
    ├── camera_cross_right_120fov/
    ├── camera_rear_left_70fov/
    ├── camera_rear_right_70fov/
    └── camera_rear_tele_30fov/
        ├── {clip_id}.{cam_name}.mp4
        └── {clip_id}.{cam_name}.timestamps.parquet

Installation¶

No additional installation steps are required beyond the standard py123d installation.

Conversion¶

To run the conversion, you need to set the environment variable $PHYSICAL_AI_AV_DATA_ROOT. You can also override the file path directly:

py123d-conversion datasets=["physical-ai-av"] \
dataset_paths.physical_ai_av_data_root=$PHYSICAL_AI_AV_DATA_ROOT # optional if env variable is set

Note

By default, the conversion stores camera data as JPEG binary and LiDAR data as IPC with LZ4 compression. You can adjust these options in the physical-ai-av.yaml converter configuration.

Dataset Issues¶

Auto-labeled detections: Bounding box labels are auto-generated and may be noisier than manually annotated datasets.
No map data: This dataset does not include HD-Map information.

Citation¶

n/a


Download	Hugging Face
Code	NVlabs/physical_ai_av
License	Please refer to the dataset’s official license terms.
Available splits	`physical-ai-av_train`, `physical-ai-av_val`, `physical-ai-av_test`