Multi-Modal Image Fusion for Enhanced Object Detection Using Generative Adversarial Networks

Main Article Content

Dadi Sanjana
Sk. Khaja Shareef
Srinath Doss

Abstract

The objective of this research is to improve object detection accuracy by leveraging multi-modal image fusion through the use of Generative Adversarial Networks (GANs). Present systems for object detection often face limitations in low-light, occlusion, and noisy environments due to the reliance on single-modal data, such as RGB images. This leads to reduced detection performance in challenging conditions. The methodology involves integrating data from multiple modalities—such as thermal, depth, and infrared images—using GANs to generate a fused image that retains complementary features from each modality. This fusion process is expected to enhance the feature extraction capability of object detection algorithms. The proposed system utilizes a GAN architecture where the generator learns to fuse multi-modal data while the discriminator ensures the quality of the fused image. Initial findings indicate a significant improvement in detection accuracy, with an increase of up to 15% in challenging conditions when compared to single-modal approaches. The system demonstrates robustness in various environments, achieving better object localization and classification. This study suggests that multi-modal image fusion can be a must-have component for real-time, robust object detection systems, particularly in applications such as autonomous driving, surveillance, and medical imaging.

Article Details

How to Cite
[1]
Dadi Sanjana, Sk. Khaja Shareef, and Srinath Doss, “Multi-Modal Image Fusion for Enhanced Object Detection Using Generative Adversarial Networks”, Int. J. Comput. Eng. Res. Trends, vol. 11, no. 7, pp. 1–12, Jul. 2024.
Section
Research Articles

References

] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018.

] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, "Deep Reinforcement Learning: A Brief Survey," IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26-38, Nov. 2017.

] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," in Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 1861-1870.

] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," Proceedings of the International Conference on Machine Learning (ICML), 2017.

] S. Levine, C. Finn, T. Darrell, and P. Abbeel, "End-to-End Training of Deep Visuomotor Policies," Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334-1373, 2016.

] S. Thrun and L. Pratt, Learning to Learn. Springer, 1998.

] Y. Bengio, A. Courville, and P. Vincent, "Representation Learning: A Review and New Perspectives," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug. 2013.

] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, "Matching Networks for One Shot Learning," in Advances in Neural Information Processing Systems, 2016, pp. 3630-3638.

] C. Finn, P. Abbeel, and S. Levine, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks," in Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 1126-1135.

] S. Ravi and H. Larochelle, "Optimization as a Model for Few-Shot Learning," in Proceedings of the International Conference on Learning Representations (ICLR), 2017.

] H. Snell, J. Swersky, and R. Zemel, "Prototypical Networks for Few-Shot Learning," in Advances in Neural Information Processing Systems, 2017, pp. 4077-4087.

] A. Gupta, C. Eppner, S. Levine, and P. Abbeel, "Learning Dexterous Manipulation for a Soft Robotic Hand from Human Demonstrations," Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 3786-3793.

] D. Zhou, Z. Lin, D. Shen, and Y. Sun, "Fast Adaptation for Legged Robots Using Meta-Learning," Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 402-409.

] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. de Freitas, "Learning to Learn by Gradient Descent by Gradient Descent," in Advances in Neural Information Processing Systems, 2016, pp. 3981-3989.

] P. Biber and W. Strasser, "The Normal Distributions Transform: A New Approach to Laser Scan Matching," in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003, pp. 2743-2748.

] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, "Past, Present, and Future of Simultaneous Localization and Mapping: Towards the Robust-Perception Age," IEEE Transactions on Robotics, vol. 32, no. 6, pp. 1309-1332, Dec. 2016.

] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press, 2005.

] J. K. Gupta, M. Egorov, and M. Kochenderfer, "Cooperative Multi-Agent Control Using Deep Reinforcement Learning," in Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017, pp. 66-83.

] A. Kendall, M. Grimes, and R. Cipolla, "PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2938-2946.

] C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine, "One-Shot Visual Imitation Learning via Meta-Learning," in Proceedings of the Conference on Robot Learning (CoRL), 2017, pp. 357-368.

] S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, "Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection," International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421-436, 2018.

] C. Wang, R. Torre, P. Sapienza, and C. F. Lee, "LiDAR-SLAM: Real-Time Adaptive Monocular and LiDAR SLAM for Autonomous Vehicles," IEEE Transactions on Intelligent Vehicles, vol. 6, no. 3, pp. 421-432, Sept. 2021.

] M. Andrychowicz, S. Schaal, and J. Achiam, "Learning to Adapt in Dynamic, Non-Stationary Environments," Proceedings of the International Conference on Learning Representations (ICLR), 2020.