Advancing Sentiment Prediction for Code- mixed Tweets with Transformer models

Main Article Content

U.Pranitha
Pinagadi Venkateswararao

Abstract

Code-mixed social media content, especially involving Indian languages like Tamil-English (TA-EN), presents unique challenges for sentiment analysis due to irregular grammar, transliteration, and frequent language switching. Traditional and even multilingual models often underperform on such linguistically complex data.
This paper aims to enhance the accuracy and efficiency of sentiment prediction on TA-EN code-mixed tweets using a Transformer-based architecture tailored for multilingual and structurally mixed inputs.
We propose a novel sentiment classification framework built on the XLM-R Transformer, enhanced with adapter-based fine-tuning and the integration of auxiliary linguistic features such as language switch count, token entropy, and mixing ratio. The system is evaluated on the DravidianCodeMix-Tamil-English dataset, using a stratified train-validation-test split and 5-fold cross-validation. Key implementation parameters include weighted cross-entropy loss, AdamW optimization, and a warm-up cosine learning schedule. The proposed model achieved an accuracy of 81.3% and a macro-F1 score of 0.784, significantly outperforming benchmarks including mBERT (72.4%), IndicBERT (75.2%), and FastFormer (78.1%). Inference latency was maintained at 5.0 ms/sample, ensuring practical deployability. Ablation studies confirmed the additive benefit of adapter layers and linguistic features. This work demonstrates that combining multilingual contextual embeddings with structural language cues substantially improves code-mixed sentiment classification. The approach is both accurate and computationally efficient, making it well-suited for real-time sentiment analysis in multilingual social media monitoring and opinion mining applications

Article Details

How to Cite
[1]
U.Pranitha and Pinagadi Venkateswararao, “Advancing Sentiment Prediction for Code- mixed Tweets with Transformer models”, Int. J. Comput. Eng. Res. Trends, vol. 12, no. 7, pp. 1–12, Jul. 2025.
Section
Research Articles

References

E. Hashmi, S. Y. Yayilgan, and S. Shaikh, “Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers,” Soc. Netw. Anal. Min., vol. 14, no. 1, p. 86, 2024.

K. K. Sampath and M. Supriya, “Transformer based sentiment analysis on code mixed data,” Procedia Comput. Sci., vol. 233, pp. 682–691, 2024.

Mamta and A. Ekbal, “Transformer based multilingual joint learning framework for code-mixed and English sentiment analysis,” J. Intell. Inf. Syst., vol. 62, no. 1, pp. 231–253, 2024.

M. K. Nazir, C. N. Faisal, M. A. Habib, and H. Ahmad, “Leveraging multilingual transformer for multiclass sentiment analysis in code-mixed data of low-resource languages,” IEEE Access, 2025.

M. Krasitskii, O. Kolesnikova, L. C. Hernandez, G. Sidorov, and A. Gelbukh, “Advancing sentiment analysis in Tamil-English code-mixed texts: Challenges and transformer-based solutions,” arXiv preprint arXiv:2503.23295, 2025.

C. B. Pednekar and M. Prakash, “SenTAS: Advancing sentiment analysis in code-mixed Marathi text through multi-head attention and convolutional BiLSTM,” Int. J. Comput., vol. 18, no. 1, pp. 1–15, 2025.

S. S. Almalki, “Sentiment analysis and emotion detection using transformer models in multilingual social media data,” Int. J. Adv. Comput. Sci. Appl., vol. 16, no. 3, 2025.

M. A. Jahin, M. S. H. Shovon, M. F. Mridha, M. R. Islam, and Y. Watanobe, “A hybrid transformer and attention based recurrent neural network for robust and interpretable sentiment analysis of tweets,” Sci. Rep., vol. 14, no. 1, p. 24882, 2024.

S. Patankar and M. Phadke, “A CNN-transformer framework for emotion recognition in code-mixed English–Hindi data,” Discover Artif. Intell., vol. 5, no. 1, p. 160, 2025.

A. Sherif and C. Sabty, “Sentiment analysis for Egyptian Arabic-English code-switched data using traditional neural models and advanced language models,” in Proc. Int. Conf. Speech Comput. (SPECOM), Cham, Switzerland: Springer, pp. 54–69, Nov. 2024.

M. S. I. Sajol, A. J. Hasan, M. S. Islam, and M. S. Rahman, “Transforming social media analysis: TweetEval benchmarking with advanced transformer models,” in 2024 8th Int. Symp. Multidiscip. Stud. Innov. Technol. (ISMSIT), pp. 1–6, IEEE, 2024.

G. Bandarupalli, “Enhancing sentiment analysis in multilingual social media data using transformer-based NLP models: A synthetic computational study,” Authorea Preprints, 2025.

M. A. Hider, S. Ahsan, J. Hossain, and M. M. Hoque, “Emotion classification in Bengali-English code-mixed data using transformers,” in 2024 27th Int. Conf. Comput. Inf. Technol. (ICCIT), pp. 3529–3535, IEEE, Dec. 2024.

I. Prathap, D. Gupta, and A. R. Nair, “Enhancing hate speech detection in Tamil code-mix content: A deep learning approach with multilingual embeddings,” in 2024 5th IEEE Glob. Conf. Adv. Technol. (GCAT), pp. 1–6, Oct. 2024.

A. Kumar, A. Pandey, S. Ahlawat, and Y. Prasad, “On enhancing code-mixed sentiment and emotion classification using FNet and FastFormer,” unpublished.

M. P. K. T., G. Shrinithi, P. Nithish, and A. C. Pranesh, “Comparative analysis of transformer models for sentiment classification in code-mixed Indic languages,” Int. J. Eng. Res. Sustain. Technol. (IJERST), vol. 3, no. 1, pp. 1–9, 2025.

G. V. Singh, S. Ghosh, M. Firdaus, A. Ekbal, and P. Bhattacharyya, “Predicting multi-label emojis, emotions, and sentiments in code-mixed texts using an emojifying sentiments framework,” Sci. Rep., vol. 14, no. 1, p. 12204, 2024.

S. K. Singh, A. Sharma, D. Singh, S. Pandit, and U. Saghir, “Sentiment analysis of English-Hindi code-mixed text using mBERT model,” in 2025 3rd Int. Conf. Invent. Comput. Informat. (ICICI), pp. 552–556, IEEE, Jun. 2025.

A. Albladi, M. Islam, and C. Seals, “Sentiment analysis of Twitter data using NLP models: A comprehensive review,” IEEE Access, 2025.

S. Chanda, A. Mishra, and S. Pal, “Sentiment analysis of code-mixed Dravidian languages leveraging pretrained model and word-level language tag,” Nat. Lang. Process., vol. 31, no. 2, pp. 477–499, 2025.

B. Chakravarthi, V. R. M. S. Chakravarthi, A. R. Madasamy, D. S. Bhandari, M. Arcan, and J. P. McCrae, “Overview of the Track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text,” in Proc. Forum Information Retrieval Evaluation (FIRE), 2020, pp. 21–24.( https://github.com/bharathichezhiyan/DravidianCodeMix-Dataset)