We introduce the first approach to solve the challeng-ing problem of unsupervised 4D visual scene understand-ing for complex dynamic scenes with multiple interactingpeople from multi-view video. Our approach simultane-ously estimates a detailed model that includes a per-pixelsemantically and temporally coherent reconstruction, to-gether with instance-level segmentation exploiting photo-consistency, semantic and motion information. We furtherleverage recent advances in 3D pose estimation to constrainthe joint semantic instance segmentation and 4D temporallycoherent reconstruction. This enables per person seman-tic instance segmentation of multiple interacting people incomplex dynamic scenes. Extensive evaluation of the jointvisual scene understanding framework against state-of-the-art methods on challenging indoor and outdoor sequencesdemonstrates a significant (≈40%) improvement in seman-tic segmentation, reconstruction and scene flow accuracy.


U4D: Unsupervised 4D Dynamic Scene Understanding
Armin Mustafa, Chris Russell and Adrian Hilton
ICCV 2019


Data used in this work can be found in the CVSSP 3D Data Repository.


				title = {U4D: Unsupervised 4D Dynamic Scene Understanding},
				author={Mustafa, A. and Russell, C. and Hilton, A.}



This research was supported by the Royal Academy of Engineering Research Fellowship RF-201718-17177, and the European Commission and EPSRC Platform Grant on Audio-Visual Media Research EP/P022529.