Advancing Sentiment Prediction for Code- mixed Tweets with Transformer models
Main Article Content
Abstract
Code-mixed social media content, especially involving Indian languages like Tamil-English (TA-EN), presents unique challenges for sentiment analysis due to irregular grammar, transliteration, and frequent language switching. Traditional and even multilingual models often underperform on such linguistically complex data.
This paper aims to enhance the accuracy and efficiency of sentiment prediction on TA-EN code-mixed tweets using a Transformer-based architecture tailored for multilingual and structurally mixed inputs.
We propose a novel sentiment classification framework built on the XLM-R Transformer, enhanced with adapter-based fine-tuning and the integration of auxiliary linguistic features such as language switch count, token entropy, and mixing ratio. The system is evaluated on the DravidianCodeMix-Tamil-English dataset, using a stratified train-validation-test split and 5-fold cross-validation. Key implementation parameters include weighted cross-entropy loss, AdamW optimization, and a warm-up cosine learning schedule. The proposed model achieved an accuracy of 81.3% and a macro-F1 score of 0.784, significantly outperforming benchmarks including mBERT (72.4%), IndicBERT (75.2%), and FastFormer (78.1%). Inference latency was maintained at 5.0 ms/sample, ensuring practical deployability. Ablation studies confirmed the additive benefit of adapter layers and linguistic features. This work demonstrates that combining multilingual contextual embeddings with structural language cues substantially improves code-mixed sentiment classification. The approach is both accurate and computationally efficient, making it well-suited for real-time sentiment analysis in multilingual social media monitoring and opinion mining applications
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
E. Hashmi, S. Y. Yayilgan, and S. Shaikh, “Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers,” Soc. Netw. Anal. Min., vol. 14, no. 1, p. 86, 2024.
K. K. Sampath and M. Supriya, “Transformer based sentiment analysis on code mixed data,” Procedia Comput. Sci., vol. 233, pp. 682–691, 2024.
Mamta and A. Ekbal, “Transformer based multilingual joint learning framework for code-mixed and English sentiment analysis,” J. Intell. Inf. Syst., vol. 62, no. 1, pp. 231–253, 2024.
M. K. Nazir, C. N. Faisal, M. A. Habib, and H. Ahmad, “Leveraging multilingual transformer for multiclass sentiment analysis in code-mixed data of low-resource languages,” IEEE Access, 2025.
M. Krasitskii, O. Kolesnikova, L. C. Hernandez, G. Sidorov, and A. Gelbukh, “Advancing sentiment analysis in Tamil-English code-mixed texts: Challenges and transformer-based solutions,” arXiv preprint arXiv:2503.23295, 2025.
C. B. Pednekar and M. Prakash, “SenTAS: Advancing sentiment analysis in code-mixed Marathi text through multi-head attention and convolutional BiLSTM,” Int. J. Comput., vol. 18, no. 1, pp. 1–15, 2025.
S. S. Almalki, “Sentiment analysis and emotion detection using transformer models in multilingual social media data,” Int. J. Adv. Comput. Sci. Appl., vol. 16, no. 3, 2025.
M. A. Jahin, M. S. H. Shovon, M. F. Mridha, M. R. Islam, and Y. Watanobe, “A hybrid transformer and attention based recurrent neural network for robust and interpretable sentiment analysis of tweets,” Sci. Rep., vol. 14, no. 1, p. 24882, 2024.
S. Patankar and M. Phadke, “A CNN-transformer framework for emotion recognition in code-mixed English–Hindi data,” Discover Artif. Intell., vol. 5, no. 1, p. 160, 2025.
A. Sherif and C. Sabty, “Sentiment analysis for Egyptian Arabic-English code-switched data using traditional neural models and advanced language models,” in Proc. Int. Conf. Speech Comput. (SPECOM), Cham, Switzerland: Springer, pp. 54–69, Nov. 2024.
M. S. I. Sajol, A. J. Hasan, M. S. Islam, and M. S. Rahman, “Transforming social media analysis: TweetEval benchmarking with advanced transformer models,” in 2024 8th Int. Symp. Multidiscip. Stud. Innov. Technol. (ISMSIT), pp. 1–6, IEEE, 2024.
G. Bandarupalli, “Enhancing sentiment analysis in multilingual social media data using transformer-based NLP models: A synthetic computational study,” Authorea Preprints, 2025.
M. A. Hider, S. Ahsan, J. Hossain, and M. M. Hoque, “Emotion classification in Bengali-English code-mixed data using transformers,” in 2024 27th Int. Conf. Comput. Inf. Technol. (ICCIT), pp. 3529–3535, IEEE, Dec. 2024.
I. Prathap, D. Gupta, and A. R. Nair, “Enhancing hate speech detection in Tamil code-mix content: A deep learning approach with multilingual embeddings,” in 2024 5th IEEE Glob. Conf. Adv. Technol. (GCAT), pp. 1–6, Oct. 2024.
A. Kumar, A. Pandey, S. Ahlawat, and Y. Prasad, “On enhancing code-mixed sentiment and emotion classification using FNet and FastFormer,” unpublished.
M. P. K. T., G. Shrinithi, P. Nithish, and A. C. Pranesh, “Comparative analysis of transformer models for sentiment classification in code-mixed Indic languages,” Int. J. Eng. Res. Sustain. Technol. (IJERST), vol. 3, no. 1, pp. 1–9, 2025.
G. V. Singh, S. Ghosh, M. Firdaus, A. Ekbal, and P. Bhattacharyya, “Predicting multi-label emojis, emotions, and sentiments in code-mixed texts using an emojifying sentiments framework,” Sci. Rep., vol. 14, no. 1, p. 12204, 2024.
S. K. Singh, A. Sharma, D. Singh, S. Pandit, and U. Saghir, “Sentiment analysis of English-Hindi code-mixed text using mBERT model,” in 2025 3rd Int. Conf. Invent. Comput. Informat. (ICICI), pp. 552–556, IEEE, Jun. 2025.
A. Albladi, M. Islam, and C. Seals, “Sentiment analysis of Twitter data using NLP models: A comprehensive review,” IEEE Access, 2025.
S. Chanda, A. Mishra, and S. Pal, “Sentiment analysis of code-mixed Dravidian languages leveraging pretrained model and word-level language tag,” Nat. Lang. Process., vol. 31, no. 2, pp. 477–499, 2025.
B. Chakravarthi, V. R. M. S. Chakravarthi, A. R. Madasamy, D. S. Bhandari, M. Arcan, and J. P. McCrae, “Overview of the Track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text,” in Proc. Forum Information Retrieval Evaluation (FIRE), 2020, pp. 21–24.( https://github.com/bharathichezhiyan/DravidianCodeMix-Dataset)