$ \newcommand{\cc}[1]{\color{black}{#1}} \newcommand{\mvec}[1]{\mathbf{#1}} \newcommand{\cvec}[2]{^\mathrm{#2}\mathbf{#1}} \newcommand{\ctrans}[3]{^\mathrm{#3}\mathbf{#1}_\mathrm{#2}} \newcommand{\rmat}[9]{\cc{\begin{bmatrix}{#1}&{#2}&{#3}\\\ {#4}&{#5}&{#6}\\\ {#7}&{#8}&{#9}\end{bmatrix}}} \newcommand{\hmat}[3]{\cc{\begin{bmatrix}{#1}\\\ {#2}\\\ {#3}\\\ \end{bmatrix}}} \newcommand{\mq}[4]{\cc{\begin{bmatrix}{#1}&{#2}&{#3}&{#4} \end{bmatrix}}} \newcommand{\nvec}[3]{\cc{\begin{bmatrix}{#1}&{#2}&{#3} \end{bmatrix}}} \newcommand{\vvec}[3]{\cc{\begin{bmatrix}{#1}\\\ {#2}\\\ {#3}\end{bmatrix}}} \newcommand{\vvt}[2]{\cc{\begin{bmatrix}{#1}\\\ {#2}\end{bmatrix}}} \newcommand{\calmat}[4]{\cc{\begin{bmatrix}{#1}&0&{#3}\\\ 0&{#2}&{#4}\end{bmatrix}}} \newcommand{\dp}{^{\prime\prime}} $

The MVSEC Dataset

The Multi Vehicle Stereo Event Camera Dataset

Sensors mounted on multiple vehicles.

The Multi Vehicle Stereo Event Camera dataset is a collection of data designed for the development of novel 3D perception algorithms for event based cameras. Stereo event data is collected from car, motorbike, hexacopter and handheld data, and fused with lidar, IMU, motion capture and GPS to provide ground truth pose and depth images. In addition, we provide images from a standard stereo frame based camera pair for comparison with traditional techniques.

Event based cameras are a new asynchronous sensing modality that measure changes in image intensity. When the log intensity over a pixel changes above a set threshold, the camera immediately returns the pixel location of a change, along with a timestamp with microsecond accuracy, and the direction of the change (up or down). This allows for sensing with extremely low latency. In addition, the cameras have extremely high dynamic range and low power usage.


Hexacopter Indoor. Hexacopter Outdoor. Handheld.
Hexacopter Indoor
Hexacopter Outdoor
Daytime Driving. Nighttime Driving..
Daytime Driving
Nighttime Driving
Data was collected from four different vehicles, in both indoor and outdoor environments, in day and night settings. All hexacopter sequences have motion capture ground truth from an indoor Vicon area and outdoor Qualisys area, while the other sequences have ground truth generated by fusing lidar information with IMU and GPS. The full list of sequences can be found below:
  • Indoor short
  • Indoor long
  • Outdoor afternoon
  • Outdoor evening
  • Indoor-outdoor
  • Outdoor-indoor
Outdoor Car
  • Pennovations day
  • Pennovations evening
  • West Philadelphia day
  • West Philadelphia evening
  • Motorcycle highway
Indoor Vicon motion capture area.
Indoor Vicon motion capture area.
Outdoor Qualisys motion capture area.
Outdoor Qualisys motion capture area.


Full sensor configuration. A number of different sensors and modalities and rigidly mounted to a stereo event camera pair, in order to generate accurate ground truth information, as well as to provide avenues for research in sensor fusion between modalities.

For events, two experimental DAVIS 346B cameras are mounted in a stereo (X axes aligned) configuration. each camera has a resolution of 346x260 pixels, with a 4mm lens and roughly 70 degrees vertical field of view. The camera clocks are synchronized by a hardware trigger generated by the left camera and send to the right camera. In addition to events, the cameras also each generate IMU and frame based image measurements.

In addition, a Velodyne lidar and stereo frame based camera with IMU (VI Sensor) is mounted with the DAVIS cameras. When available, ground truth pose is also captured using an indoor (Vicon, left) or outdoor (Qualisys, right) motion capture system.

The full set of sensor characteristics can be found below:
  • 346x260 pixels
  • APS (Active Pixel Sensor for frame based images)
  • DVS (Dynamic Vision Sensor for events)
  • FOV: 50° vert., 65° horiz.
  • IMU: MPU 6150 at 1kHz
  • Skybotix integrated VI-sensor
  • stereo camera: 2 x Aptina MT9V034
  • gray 2x752x480 at 20fps, global shutter
  • FOV: 57deg vert., 2 x 80deg horiz.
  • IMU: ADIS16488 at 200Hz
Velodyne Puck LITE
  • FOV: 30° vert. 360° horiz.
  • 16 channel
  • 20Hz
  • 100m range
  • 72-channel u-blox M8 engine
  • Position accuracy 2.0m CEP
Motion Capture
  • Indoor Vicon
  • 88 x 22 x 15 ft
  • 20 Vicon Vantage VP-16 Cameras
  • 100Hz pose updates
  • Outdoor Qualisys
  • 100 x 50 x 50 ft
  • 34 Qualisys Opus 700 Cameras
  • 100Hz pose updates

Ground Truth

For most sequences, accurate pose and depths are provided from a fusion of the sensors onboard.

map with pose

Reconstructed map with trajectory in green.
depth image
Depth map (red) overlaid on APS from DAVIS, lighter is further.


Please cite the following paper when using this work in an academic publication:

Zhu, A. Z., Thakur, D., Ozaslan, T., Pfrommer, B., Kumar, V., & Daniilidis, K. (2018). The Multi Vehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception. IEEE Robotics and Automation Letters, 3(3), 2032-2039.

An arXiv preprint is also available:

Zhu, A. Z., Thakur, D., Ozaslan, T., Pfrommer, B., Kumar, V., & Daniilidis, K. (2018). The Multi Vehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception. arXiv preprint arXiv:1801.10202.