City digital twins help train deep learning models to separate building facades

To automatically generate data for training deep convolutional neural network models to segment building facades, researchers from Osaka University used a three-dimensional model and game engine to generate digital city twin synthetic training data. They found that a model trained on these data mixed with some real data was competitive with a model trained on real data alone, revealing the potential of digital twin data to improve accuracy and replace costly manually annotated real data.

Fig. 1. Comparison of manually annotated datasets and automatically generated synthetic datasets. The conventional method requires images to be labeled by hand when the training set is produced, whereas our proposed system can automatically create synthetic data with instance annotations using digital assets from a city digital twin.

Researchers from Osaka University find that images of city digital twins, created using 3D models and game engines, can be combined with images of the real city to easily create deep-learning model training data for most modern architecture

 Osaka, Japan - Game engines were originally developed to build imaginary worlds for entertainment. However, these same engines can be used to build copies of real environments, that is, digital twins. Researchers from Osaka University have found a way to use the images that were automatically generated by digital city twins to train deep learning models that can efficiently analyze images of real cities and accurately separate the buildings that appear in them.

A convolutional neural network is a deep learning neural network designed for processing structured arrays of data such as images. Such advancements in deep learning have fundamentally changed the way tasks, like architectural segmentation, are performed. However, an accurate deep convolutional neural network (DCNN) model needs a large volume of labeled training data and labeling these data can be a slow and extremely expensive manual undertaking.

 

Fig. 2. Three-dimensional city model of our study area. (a) Example of a city digital twin with its real-world street-view counterpart (Wangan-doro Avenue, Tokyo; March 2021; latitude: 35.6283, longitude: 139.7782). (b) Aerial view of the city digital twin.

To create the synthetic digital city twin data, the investigators used a 3D city model from the PLATEAU platform, which contains 3D models of most Japanese cities at an extremely high level of detail. They loaded this model into the Unity game engine and created a camera setup on a virtual car, which drove around the city and acquired the virtual data images under various lighting and weather conditions. The Google Maps API was then used to obtain real street-level images of the same study area for the experiments.

The researchers found that the digital city twin data leads to better results than purely virtual data with no real-world counterpart. Furthermore, adding synthetic data to a real dataset improves segmentation accuracy. However, most importantly, the investigators found that when a certain fraction of real data is included in the digital city twin synthetic dataset, the segmentation accuracy of the DCNN is boosted significantly. In fact, its performance becomes competitive with that of a DCNN trained on 100% real data. “These results reveal that our proposed synthetic dataset could potentially replace all the real images in the training set,” says Tomohiro Fukuda, the corresponding author of the paper.

Automatically separating out the individual building facades that appear in an image is useful for construction management and architecture design, large-scale measurements for retrofits and energy analysis, and even visualizing building facades that have been demolished. The system was tested on multiple cities, demonstrating the proposed framework’s transferability. The hybrid dataset of real and synthetic data yields promising prediction results for most modern architectural styles. This makes it a promising approach for training DCNNs for architectural segmentation tasks in the future – without the need for costly manual data annotation.

###

The article, “Automatic generation of synthetic datasets from a city digital twin for use in the instance segmentation of building facades,” was published in the Journal of Computational Design and Engineering at DOI: https://doi.org/10.1093/jcde/qwac086.

 

Fig. 3. Qualitative results for different types and sizes of buildings when Mask R-CNN is trained using HSRBFIA (Hybrid Collection of Synthetic and Real-world Building Facade Images and Annotations) datasets with different ratios of synthetic to real data: (a) low-rise houses in Osaka; (b) low-rise houses in Los Angeles; (c) high-rise houses in New York City; (d) complex facades in Shanghai. (The red dashed rectangles highlight parts of the street-view images that were prone to failure during facade instance segmentation.)

About Osaka University

Osaka University was founded in 1931 as one of the seven imperial universities of Japan and is now one of Japan's leading comprehensive universities with a broad disciplinary spectrum. This strength is coupled with a singular drive for innovation that extends throughout the scientific process, from fundamental research to the creation of applied technology with positive economic impacts. Its commitment to innovation has been recognized in Japan and around the world, being named Japan's most innovative university in 2015 (Reuters 2015 Top 100) and one of the most innovative institutions in the world in 2017 (Innovative Universities and the Nature Index Innovation 2017). Now, Osaka University is leveraging its role as a Designated National University Corporation selected by the Ministry of Education, Culture, Sports, Science and Technology to contribute to innovation for human welfare, sustainable development of society, and social transformation.

Website: https://resou.osaka-u.ac.jp/en