Stylistic Image Captioning with Adversarial Learning: A Novel Approach

Sushma Jaiswal; Harikumar Pallthadka; Rajesh P. Chinchewadi; Tarun Jaiswal

doi:10.22362/ijcert/2024/v11/i1/v11i101

PDF

Published: Jan 3, 2024

DOI: https://doi.org/10.22362/ijcert/2024/v11/i1/v11i101

Keywords:

CNN, LSTM, Image Caption, BLSTM, Attention-GAN

Sushma Jaiswal

Guru Ghasidas Central University, Bilaspur (C.G.) and Post-Doctoral Research Fellow, Manipur International University, Imphal, Manipur

Harikumar Pallthadka

Manipur International University, Imphal, Manipur, vc@miu.edu.in, https://orcid.org/0000-0002-0705-9035.

Rajesh P. Chinchewadi

Manipur International University, Imphal, Manipur

Tarun Jaiswal

National Institute of Technology, Raipur

Abstract

this paper present "Attention-GAN," a new image captioning model that synergistically integrates attention mechanisms and Generative Adversarial Networks (GANs) to revolutionize image caption production. Attention-GAN has two main parts. First, an attention-based caption generator that strongly correlates visual regions with caption segments. This attention mechanism helps the model highlight important visual aspects and provide meaningful, contextual captions. Second, an adversarial training process adds aesthetic diversity to the caption generator. Adversarial training produces more subtle and different stylized descriptions, resulting in captions that express the image's content and aesthetic and stylistic variances. More interesting and varied image captions result from our dual-component technique, which blends attention-based modelling precision with adversarial learning inventiveness. Attention-GAN generates contextually relevant and artistically appealing captions in extensive benchmark dataset trials. Quantitative and qualitative analyses show that the model is capable of creating captions that match image content and have varied stylistic subtleties. Attention-GAN is a promising image captioning technology that can bridge the gap between factual description and creative expression for a variety of computer vision and natural language processing applications.

How to Cite

[1]

Sushma Jaiswal, Harikumar Pallthadka, Rajesh P. Chinchewadi, and Tarun Jaiswal, “Stylistic Image Captioning with Adversarial Learning: A Novel Approach”, Int. J. Comput. Eng. Res. Trends, vol. 11, no. 1, pp. 1–8, Jan. 2024.

Issue

Vol. 11 No. 1 (2024): January(2024) Issue

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

IJCERT Policy:

The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.

By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.

References

. R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Advances in neural information processing systems, 2000, pp. 1057–1063.

. S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” in Advances in Neural Information Processing Systems, 2015, pp. 1171–1179

. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672– 2680.

. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014

. L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generative adversarial nets with policy gradient.” in AAAI, 2017, pp. 2852–2858.

. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016

. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

. Chen Chen, Shuai Mu, Wanpeng Xiao, Zexiong Ye, Liesi Wu, and Qi Ju. 2019. Improving image captioning with conditional generative adversarial nets. In AAAI. 8142–8150.

. Z. Yi et al. “DualGAN: Unsupervised Dual Learning for Image-to-Image Translation”. In: ArXiv e-prints (Apr. 2017). arXiv: 1704.02510 [cs.CV].

. I. Gulrajani et al. “Improved Training of Wasserstein GANs”. In: ArXiv e-prints (Mar. 2017). arXiv: 1704.00028 [cs.LG].

. Milne, Tristan and Adrian I. Nachman. “Wasserstein GANs with Gradient Penalty Compute Congested Transport.” ArXiv abs/2109.00528 (2021): n. pag.

. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick.Microsoft COCO: Common Objects in Context.InEuropean Conference on Computer Vision, pp. 740–755

. P. Mathews, L. Xie, and X. He.SentiCap: Generating Image Descriptions with Sentiments.InAAAI Conference on Artificial Intelligence, pp. 3574–3580 (2016).

. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu.BLEU: A Method for Automatic Evaluation of Machine Translation.InAnnual Meeting of the Association for Computational Linguistics, pp. 311–318 (Association for Computational Linguistics, 2002).

. M. Denkowski and A. Lavie.Meteor Universal: Language Specific Translation Evaluation for any Target Language.InConference on Machine Translation, pp. 376–380 (2014).

. C.-Y. Lin.Rouge: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out (2004).

. R. Vedantam, C. Lawrence Zitnick, and D. Parikh.CIDEr: Consensus-Based Image Description Evaluation.InIEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (IEEE, 2015).

. Anderson, Peter, Basura Fernando, Mark Johnson and Stephen Gould. “SPICE: Semantic Propositional Image Caption Evaluation.” ArXiv abs/1607.08822 (2016): n. pag.

. Mathews, A., Lexing Xie and Xuming He. “SentiCap: Generating Image Descriptions with Sentiments.” ArXiv abs/1510.01431 (2015): n. pag.

. D. P. Kingma and J. Ba.Adam: A Method for Stochastic Optimization. arXiv Preprint arXiv:1412.6980 (2014).

Stylistic Image Captioning with Adversarial Learning: A Novel Approach

Abstract

References

Most read articles by the same author(s)

QUICK LINKS

FOR AUTHORS

FOR REVIEWERS

JOURNAL CONTENTS

DOWNLOADS

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)