Recent advances in sensor technology have introduced low-cost RGB video plus depth sensors, such as the Kinect, which enable simultaneous acquisition of colour and depth images at video rates. This paper introduces a framework for representation of general dynamic scenes from video plus depth acquisition. A hybrid representation is proposed which combines the advantages of prior surfel graph surface segmentation and modelling work with the higher-resolution surface reconstruction capability of volumetric fusion techniques. The contributions are (1) extension of a prior piecewise surfel graph modelling approach for improved accuracy and completeness, (2) combination of this surfel graph modelling with TSDF surface fusion to generate dense geometry, and (3) proposal of means for validation of the reconstructed 4D scene model against the input data and efficient storage of any unmodelled regions via residual depth maps. The approach allows arbitrary dynamic scenes to be efficiently represented with temporally consistent structure and enhanced levels of detail and completeness where possible, but gracefully falls back to raw measurements where no structure can be inferred. The representation is shown to facilitate creative manipulation of real scene data which would previously require more complex capture setups or manual processing.



Supplementary Material



        AUTHOR = "Malleson, Charles and Guillemaut, Jean-Yves and Hilton, Adrian",
        TITLE = "Hybrid modelling of non-rigid scenes from RGBD cameras",
        JOURNAL = "IEEE Transactions on Circuits and Systems for Video Technology",
        YEAR = "2018",


The RGBD video sequences used in this paper will shortly be made available for research purposes.


This work was funded by the EU FP7 project SCENE and EPSRC Audio-Visual Media Platform Grant EP/PO22529/1.