The accurate detection and extraction of building information from aerial imagery is of paramount importance in urban planning, land use analysis, and disaster management. This study presents a comprehensive investigation into the development of a robust and efficient methodology for building detection in satellite imagery utilizing state-of-the-art deep learning techniques. We conducted a comparative analysis of three distinct semantic segmentation models based on the U-Net architecture: a baseline U-Net trained from scratch, a U-Net incorporating a pre-trained ResNet34 encoder, and a U-Net with custom architectural enhancements. Our methodological approach encompassed data augmentation strategies, transfer learning techniques, and ensemble methods to optimize model performance. The Inria Aerial Image Labelling Dataset served as the primary source for model training and validation. We explored the efficacy of various loss functions, including dice loss, focal loss, and weighted cross-entropy, to address class imbalance and enhance segmentation accuracy. Model performance was rigorously evaluated using a comprehensive set of metrics, including pixel-wise accuracy, Intersection over Union (IoU), and F1-score. Our highest-performing individual model achieved a dice score of 92 percent on the validation set, while the implementation of ensemble techniques further improved detection accuracy to 93 percent on the heldout test set. Post-processing algorithms, incorporating traditional computer vision methods, were applied to refine building polygon delineation. This research demonstrates the efficacy of deep learning-based segmentation approaches for building detection in aerial imagery and offers valuable insights into potential applications across various domains, including urban planning, construction monitoring, and disaster response. Future research directions may explore building classification, change detection analysis, and model optimization for real-time applications in dynamic urban environments.
Published in | American Journal of Computer Science and Technology (Volume 7, Issue 4) |
DOI | 10.11648/j.ajcst.20240704.16 |
Page(s) | 183-194 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2024. Published by Science Publishing Group |
Building Detection, Aerial Imagery, Semantic Segmentation, U-Net, Deep Learning, Ensemble Methods, Satellite Image Analysis, Urban Planning
[1] | K. Zhao, J. Kang, J. Jung, and G. Sohn, “Building extraction from satellite images using mask r-cnn with building boundary regularization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 247-251, 2018. |
[2] | O. Ronneberger, P. Fischer, and T. Brox, “U- net: Convolutional networks for biomedical image segmentation,” 2015. |
[3] | F. I. Diakogiannis, F. Waldner, P. Caccetta, and C. Wu, “Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 162, p. 94- 114, Apr 2020. |
[4] | E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), IEEE, 2017. |
[5] | J. Schmidt, “Creating and training a u-net model with pytorch for 2d & 3d semantic segmentation: Dataset...,” May 2021. |
[6] | S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, “How does batch normalization help optimization?” 2019. |
[7] | Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” 2019. |
[8] | J. H. Sylvain Gugger, “Adamw and superconvergence is now the fastest way to train neural nets,” 2018. |
[9] | D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017. |
[10] | E.Li, J.Femiani, S.Xu, X.Zhang, andP.Wonka, “Robust rooftop extraction from visible band images using higher order crf,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 8, pp. 5561-5575, 2019. |
[11] | K. Bittner, F. Adam, S. Cui, M. Körner, and P. Reinartz, “Building footprint extraction from vhr remote sensing images combined with normalized dsms using fused fully convolutionalnetworks,” IEEEJournalofSelectedTopics in Applied Earth Observations and Remote Sensing, vol. 11, no. 8, pp. 2615-2629, 2018. |
[12] | A. G. Roy, N. Navab, and C. Wachinger, “Recalibrating fully convolutional networks with spatial and channel ‘squeeze & excitation’ blocks,” CoRR, vol. abs/1808.08127, 2018. |
[13] | Z. Zhang, Y. Wang, W. Gan, and X. Jia, “Refined unet: Unet-based refinement network for building extraction from optical remote sensing imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 18, no. 9, pp. 1620- 1624, 2020. |
[14] | J. Howard, “Progressive resizing for better generalization of the computer vision models,” 2018. |
[15] | A. Aitken, C. Ledig, L. Theis, J. Caballero, Z.Wang, and W. Shi, “Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize,” 2017. |
[16] | Q. Chen, L. Wang, Y. Wu, G. Wu, Z. Guo, and S. L. Waslander, “Aerial imagery for roof segmentation: A large-scale dataset towards automatic mapping of buildings,” ISPRS journal of photogrammetry and remote sensing, vol. 147, pp. 42-55, 2019. |
[17] | W. Li, C. He, J. Fang, J. Zheng, H. Fu, and L. Yu, “Semantic segmentation-based building footprint extraction using very high-resolution satellite images and multi-source gis data,” Remote Sensing, vol. 11, no. 4, p. 403, 2019. |
[18] | L. N. Smith, “A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay,” 2018. |
[19] | Wikipedia, “Satellite imagery and challenges associated with them,” 2018. |
APA Style
Singh, S., Wiles, C., Bilal, A. (2024). Novel Building Detection and Location Intelligence in Aerial Satellite Imagery. American Journal of Computer Science and Technology, 7(4), 183-194. https://doi.org/10.11648/j.ajcst.20240704.16
ACS Style
Singh, S.; Wiles, C.; Bilal, A. Novel Building Detection and Location Intelligence in Aerial Satellite Imagery. Am. J. Comput. Sci. Technol. 2024, 7(4), 183-194. doi: 10.11648/j.ajcst.20240704.16
@article{10.11648/j.ajcst.20240704.16, author = {Sandeep Singh and Christian Wiles and Ahmed Bilal}, title = {Novel Building Detection and Location Intelligence in Aerial Satellite Imagery}, journal = {American Journal of Computer Science and Technology}, volume = {7}, number = {4}, pages = {183-194}, doi = {10.11648/j.ajcst.20240704.16}, url = {https://doi.org/10.11648/j.ajcst.20240704.16}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20240704.16}, abstract = {The accurate detection and extraction of building information from aerial imagery is of paramount importance in urban planning, land use analysis, and disaster management. This study presents a comprehensive investigation into the development of a robust and efficient methodology for building detection in satellite imagery utilizing state-of-the-art deep learning techniques. We conducted a comparative analysis of three distinct semantic segmentation models based on the U-Net architecture: a baseline U-Net trained from scratch, a U-Net incorporating a pre-trained ResNet34 encoder, and a U-Net with custom architectural enhancements. Our methodological approach encompassed data augmentation strategies, transfer learning techniques, and ensemble methods to optimize model performance. The Inria Aerial Image Labelling Dataset served as the primary source for model training and validation. We explored the efficacy of various loss functions, including dice loss, focal loss, and weighted cross-entropy, to address class imbalance and enhance segmentation accuracy. Model performance was rigorously evaluated using a comprehensive set of metrics, including pixel-wise accuracy, Intersection over Union (IoU), and F1-score. Our highest-performing individual model achieved a dice score of 92 percent on the validation set, while the implementation of ensemble techniques further improved detection accuracy to 93 percent on the heldout test set. Post-processing algorithms, incorporating traditional computer vision methods, were applied to refine building polygon delineation. This research demonstrates the efficacy of deep learning-based segmentation approaches for building detection in aerial imagery and offers valuable insights into potential applications across various domains, including urban planning, construction monitoring, and disaster response. Future research directions may explore building classification, change detection analysis, and model optimization for real-time applications in dynamic urban environments.}, year = {2024} }
TY - JOUR T1 - Novel Building Detection and Location Intelligence in Aerial Satellite Imagery AU - Sandeep Singh AU - Christian Wiles AU - Ahmed Bilal Y1 - 2024/12/18 PY - 2024 N1 - https://doi.org/10.11648/j.ajcst.20240704.16 DO - 10.11648/j.ajcst.20240704.16 T2 - American Journal of Computer Science and Technology JF - American Journal of Computer Science and Technology JO - American Journal of Computer Science and Technology SP - 183 EP - 194 PB - Science Publishing Group SN - 2640-012X UR - https://doi.org/10.11648/j.ajcst.20240704.16 AB - The accurate detection and extraction of building information from aerial imagery is of paramount importance in urban planning, land use analysis, and disaster management. This study presents a comprehensive investigation into the development of a robust and efficient methodology for building detection in satellite imagery utilizing state-of-the-art deep learning techniques. We conducted a comparative analysis of three distinct semantic segmentation models based on the U-Net architecture: a baseline U-Net trained from scratch, a U-Net incorporating a pre-trained ResNet34 encoder, and a U-Net with custom architectural enhancements. Our methodological approach encompassed data augmentation strategies, transfer learning techniques, and ensemble methods to optimize model performance. The Inria Aerial Image Labelling Dataset served as the primary source for model training and validation. We explored the efficacy of various loss functions, including dice loss, focal loss, and weighted cross-entropy, to address class imbalance and enhance segmentation accuracy. Model performance was rigorously evaluated using a comprehensive set of metrics, including pixel-wise accuracy, Intersection over Union (IoU), and F1-score. Our highest-performing individual model achieved a dice score of 92 percent on the validation set, while the implementation of ensemble techniques further improved detection accuracy to 93 percent on the heldout test set. Post-processing algorithms, incorporating traditional computer vision methods, were applied to refine building polygon delineation. This research demonstrates the efficacy of deep learning-based segmentation approaches for building detection in aerial imagery and offers valuable insights into potential applications across various domains, including urban planning, construction monitoring, and disaster response. Future research directions may explore building classification, change detection analysis, and model optimization for real-time applications in dynamic urban environments. VL - 7 IS - 4 ER -