Argoverse 2 - Sensor¶
Argoverse 2 (AV2) is a collection of three datasets. The Sensor Dataset includes 1000 logs of ~20 second duration, including multi-view cameras, Lidar point clouds, maps, ego-vehicle data, and bounding boxes. This dataset is intended to train 3D perception models for autonomous vehicles.
Overview
Paper |
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting |
Download |
|
Code |
|
License |
MIT License |
Available splits |
|
Available Modalities¶
Name |
Available |
Description |
|---|---|---|
Ego Vehicle |
✓ |
State of the ego vehicle, including poses, and vehicle parameters, see |
Map |
(✓) |
The HD-Maps are in 3D, but may have artifacts due to polyline to polygon conversion (see below). For more information, see |
Bounding Boxes |
✓ |
The bounding boxes are available with the |
Traffic Lights |
X |
n/a |
Cameras |
✓ |
Includes 9 cameras, see
|
Lidars |
✓ |
Includes 2 Lidars, see
|
Dataset Specific
- class py123d.parser.registry.AV2SensorBoxDetectionLabel[source]
Argoverse 2 Sensor dataset annotation categories.
- ANIMAL = 0
- ARTICULATED_BUS = 1
- BICYCLE = 2
- BICYCLIST = 3
- BOLLARD = 4
- BOX_TRUCK = 5
- BUS = 6
- CONSTRUCTION_BARREL = 7
- CONSTRUCTION_CONE = 8
- DOG = 9
- LARGE_VEHICLE = 10
- MESSAGE_BOARD_TRAILER = 11
- MOBILE_PEDESTRIAN_CROSSING_SIGN = 12
- MOTORCYCLE = 13
- MOTORCYCLIST = 14
- OFFICIAL_SIGNALER = 15
- PEDESTRIAN = 16
- RAILED_VEHICLE = 17
- REGULAR_VEHICLE = 18
- SCHOOL_BUS = 19
- SIGN = 20
- STOP_SIGN = 21
- STROLLER = 22
- TRAFFIC_LIGHT_TRAILER = 23
- TRUCK = 24
- TRUCK_CAB = 25
- VEHICULAR_TRAILER = 26
- WHEELCHAIR = 27
- WHEELED_DEVICE = 28
- WHEELED_RIDER = 29
- to_default()[source]
Inherited, see superclass.
- Return type:
DefaultBoxDetectionLabel
Installation¶
The AV2 downloader uses boto3 to pull from the public Argoverse S3 bucket. Install
the extra:
pip install py123d[av2]
pip install -e .[av2]
boto3 is only required to download the dataset. Parsing a locally-downloaded
dataset needs no extra dependencies beyond the standard py123d install.
Download¶
The AV2 Sensor dataset lives on a publicly-readable AWS S3 bucket
(s3://argoverse/datasets/av2/sensor/). No AWS credentials are required.
Downloads run through the unified py123d-download CLI:
export AV2_DATA_ROOT=/path/to/argoverse
# Download a 5-log subset of the validation split (~1.25 GB) to $AV2_DATA_ROOT
py123d-download dataset=av2-sensor \
'dataset.downloader.splits=[av2-sensor_val]' \
dataset.downloader.num_logs=5
# Or the full dataset (~250 GB across 1000 logs)
py123d-download dataset=av2-sensor
# Preview the plan without downloading
py123d-download dataset=av2-sensor \
dataset.downloader.num_logs=3 \
dataset.downloader.dry_run=true
The downloaded dataset has the following per-log structure:
$AV2_DATA_ROOT
└── sensor/
├── train/
│ └── 00a6ffc1-6ce9-3bc3-a060-6006e9893a1a/
│ ├── annotations.feather
│ ├── calibration/
│ │ ├── egovehicle_SE3_sensor.feather
│ │ └── intrinsics.feather
│ ├── city_SE3_egovehicle.feather
│ ├── map/
│ │ └── ...
│ └── sensors/
│ ├── cameras/...
│ └── lidar/...
├── val/
└── test/
Conversion¶
Local mode — data already downloaded to $AV2_DATA_ROOT:
py123d-conversion dataset=av2-sensor
Note
The conversion of AV2 by default does not store sensor data in the logs, but only
relative file paths. To change this behavior, adapt the av2-sensor.yaml
converter configuration.
Streaming mode — dataset=av2-sensor-stream attaches an Av2Downloader to
the parser; it fetches selected logs from S3 into a temp directory at parser
construction time and cleans up on parser GC. Useful when the full ~250 GB dataset
is too large for local disk and you only need a handful of logs for iteration:
# Stream the first log of the validation split only:
py123d-conversion dataset=av2-sensor-stream \
dataset.parser.downloader.num_logs=1 \
'dataset.parser.splits=[av2-sensor_val]'
# Stream specific log UUIDs:
py123d-conversion dataset=av2-sensor-stream \
'dataset.parser.downloader.log_ids={av2-sensor_val: [00a6ffc1-6ce9-3bc3-a060-6006e9893a1a]}'
# Persist downloads under a dedicated cache dir instead of a tempdir:
py123d-conversion dataset=av2-sensor-stream \
dataset.parser.downloader.num_logs=1 \
dataset.parser.downloader.output_dir=/mnt/scratch/av2_sensor_cache
Warning
Each AV2 Sensor log is ~250 MB (~1000 objects: annotations + calibration + map +
per-camera JPEGs + per-lidar feathers). Even small num_logs values imply
multi-hundred-MB of download traffic.
Note
The streaming variant overrides the default log_writer_config to force
self-contained sensor payloads (camera_store_option: jpeg_binary,
lidar_store_option: binary) since the source temp directory is deleted when
the parser is garbage collected.
Dataset Issues¶
Ego Vehicle: The vehicle parameters are partially estimated and may be subject to inaccuracies.
Citation¶
If you use this dataset in your research, please cite:
@article{Wilson2021NEURIPS,
author = {Benjamin Wilson and William Qi and Tanmay Agarwal and John Lambert and Jagjeet Singh and Siddhesh Khandelwal and Bowen Pan and Ratnesh Kumar and Andrew Hartnett and Jhony Kaesemodel Pontes and Deva Ramanan and Peter Carr and James Hays},
title = {Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting},
booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021)},
year = {2021}
}