Camera

Camera Data

class py123d.datatypes.Camera[source]

A camera observation: image, extrinsic pose, timestamp, and model-specific metadata.

Public Data Attributes:

timestamp

The Timestamp of the image capture.

metadata

The BaseCameraMetadata associated with the camera.

image

The image captured by the camera, as a numpy array.

camera_to_global_se3

The extrinsic PoseSE3 of the camera in global coordinates.

Inherited from BaseModality

timestamp

Returns the timestamp associated with this modality data, if available.

metadata

Returns the metadata associated with this modality data.

modality_type

Convenience property to access the modality type directly from the modality data.

modality_id

Convenience property to access the modality id directly from the modality data.

modality_key

Convenience property to access the modality key directly from the modality data.

Public Methods:

project_points_global(points_global)

Project 3D points in global frame to image pixel coordinates.


property modality_id: str | SerialIntEnum | None

Convenience property to access the modality id directly from the modality data.

property modality_key: str

Convenience property to access the modality key directly from the modality data.

property modality_type: ModalityType

Convenience property to access the modality type directly from the modality data.

property timestamp: Timestamp

The Timestamp of the image capture.

property metadata: BaseCameraMetadata

The BaseCameraMetadata associated with the camera.

property image: ndarray[tuple[Any, ...], dtype[uint8]]

The image captured by the camera, as a numpy array.

property camera_to_global_se3: PoseSE3

The extrinsic PoseSE3 of the camera in global coordinates.

project_points_global(points_global)[source]

Project 3D points in global frame to image pixel coordinates.

Convenience method that transforms points from global to camera frame, then delegates to BaseCameraMetadata.project_to_image().

Parameters:

points_global (ndarray[tuple[Any, ...], dtype[float64]]) – (N, 3) array of 3D points in global coordinates.

Return type:

Tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[bool]], ndarray[tuple[Any, ...], dtype[float64]]]

Returns:

A tuple of: - pixel_coords: (N, 2) array of (u, v) pixel coordinates. - in_fov_mask: (N,) boolean mask. - depth: (N,) array of signed depths.

Base Camera Metadata

class py123d.datatypes.BaseCameraMetadata[source]

Base class for camera metadata. Provides the shared interface for all camera models.

Public Data Attributes:

camera_model

The projection model of the camera.

camera_id

The camera ID, unique within a sensor rig.

camera_name

The camera name, according to the dataset naming convention.

camera_to_imu_se3

The static extrinsic pose of the camera relative to the IMU frame.

width

The width of the camera image in pixels.

height

The height of the camera image in pixels.

channel_type

The channel type of the camera image.

modality_type

Returns the type of the modality that this metadata describes.

modality_id

Returns the camera ID as the modality ID.

aspect_ratio

The aspect ratio (width / height) of the camera.

Inherited from BaseModalityMetadata

modality_type

Returns the type of the modality that this metadata describes.

modality_id

Optional identifier for the modality, e.g. sensor ID for sensor modalities.

modality_key

Returns a unique key for this modality, combining type and id if applicable.

Public Methods:

project_to_image(points_cam)

Project 3D points in camera frame to image pixel coordinates.

Inherited from BaseMetadata

to_dict()

Serialize the metadata instance to a plain Python dictionary.

from_dict(data_dict)

Construct a metadata instance from a plain Python dictionary.

Private Methods:

_compute_in_fov_mask(pixel_coords, depth[, eps])

Compute a boolean mask for points in front of the camera and within image bounds.


abstract property camera_model: CameraModel

The projection model of the camera.

abstract property camera_id: CameraID

The camera ID, unique within a sensor rig.

abstract property camera_name: str

The camera name, according to the dataset naming convention.

abstractmethod classmethod from_dict(data_dict)

Construct a metadata instance from a plain Python dictionary.

Parameters:

data_dict (Dict[str, Any]) – A dictionary containing the metadata fields.

Return type:

BaseMetadata

Returns:

A metadata instance.

property modality_key: str

Returns a unique key for this modality, combining type and id if applicable.

abstractmethod to_dict()

Serialize the metadata instance to a plain Python dictionary.

Return type:

Dict[str, Any]

Returns:

A dictionary representation using only default Python types.

abstract property camera_to_imu_se3: PoseSE3

The static extrinsic pose of the camera relative to the IMU frame.

abstract property width: int

The width of the camera image in pixels.

abstract property height: int

The height of the camera image in pixels.

property channel_type: CameraChannelType

The channel type of the camera image. Defaults to RGB.

property modality_type: ModalityType

Returns the type of the modality that this metadata describes.

property modality_id: str | SerialIntEnum | None

Returns the camera ID as the modality ID.

property aspect_ratio: float

The aspect ratio (width / height) of the camera.

abstractmethod project_to_image(points_cam)[source]

Project 3D points in camera frame to image pixel coordinates.

Parameters:

points_cam (ndarray[tuple[Any, ...], dtype[float64]]) – (N, 3) array of 3D points in the camera coordinate frame.

Return type:

Tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[bool]], ndarray[tuple[Any, ...], dtype[float64]]]

Returns:

A tuple of (pixel_coords (N,2), in_fov_mask (N,), depth (N,)).

Camera ID

class py123d.datatypes.CameraID[source]

Enumeration of camera IDs. These are unique within a sensor rig and can be used as modality IDs for camera metadata.

PCAM_F0 = 0

Front pinhole camera.

PCAM_B0 = 1

Back pinhole camera.

PCAM_L0 = 2

Left pinhole camera, first from front to back.

PCAM_L1 = 3

Left pinhole camera, second from front to back.

PCAM_L2 = 4

Left pinhole camera, third from front to back.

PCAM_R0 = 5

Right pinhole camera, first from front to back.

PCAM_R1 = 6

Right pinhole camera, second from front to back.

PCAM_R2 = 7

Right pinhole camera, third from front to back.

PCAM_STEREO_L = 8

Left pinhole stereo camera.

PCAM_STEREO_R = 9

Right pinhole stereo camera.

FMCAM_L = 10

Left-facing fisheye MEI camera.

FMCAM_R = 11

Right-facing fisheye MEI camera.

FTCAM_F0 = 12

Front F-theta camera.

FTCAM_TELE_F0 = 13

Front telephoto F-theta camera.

FTCAM_TELE_B0 = 18

Back telephoto F-theta camera.

FTCAM_L0 = 14

Left F-theta camera, first from front to back.

FTCAM_L1 = 15

Left F-theta camera, second from front to back.

FTCAM_R0 = 16

Right F-theta camera, first from front to back.

FTCAM_R1 = 17

Right F-theta camera, second from front to back.

Camera Model

class py123d.datatypes.CameraModel[source]

Enumeration of camera projection models.

PINHOLE = 0

Standard pinhole camera model.

FISHEYE_MEI = 1

Fisheye camera using the MEI (mirror) model.

FTHETA = 2

F-theta polynomial camera model.

Camera Channel Type

class py123d.datatypes.CameraChannelType[source]

Enumeration of camera channel types.

RGB = 0
GRAYSCALE = 1