A Hybrid Framework for Detecting Automated Spammers on Twitter: Integrating Machine Learning and Heuristic Approaches
Main Article Content
Abstract
Twitter's open platform has become a hotspot for automated spammers who exploit its vast user base to spread malicious and misleading content. This paper proposes a hybrid approach to detect automated spammers, integrating machine learning models with heuristic rules to achieve a robust and adaptive detection framework. The methodology leverages a rich set of features, including behavioral attributes such as account age and retweet frequency, and content-based metrics like hashtag density and sentiment polarity. The hybrid model combines the adaptability of a Gradient Boosting Classifier with manually defined heuristic rules, enabling it to address the dynamic nature of spam tactics effectively. The proposed system was evaluated using a dataset of 50,000 Twitter accounts, evenly split between spam and legitimate users. Experimental results demonstrate that the hybrid approach outperforms traditional models, achieving a Precision of 91.2%, Recall of 88.9%, F1-Score of 90.0%, and Accuracy of 91.5%. In comparison, standalone models such as Logistic Regression and Support Vector Machines achieved significantly lower performance metrics. These findings highlight the hybrid approach's superior ability to accurately classify spam accounts while minimizing false positives. Despite its effectiveness, the framework's scalability for real-time applications and generalization across platforms remain areas for future work. Additionally, integrating graph-based features and exploring unsupervised techniques are promising directions to enhance detection of previously unseen spam patterns. Overall, this research provides a robust solution for mitigating the growing threat of automated spammers on social media platforms
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
K. S. Adewole, N. B. Anuar, A. Kamsin, K. D. Varathan, and S. A. Razak, "Malicious accounts: Dark of the social networks," J. Netw. Comput. Appl., vol. 79, pp. 41–67, 2017.
W. Kim, O.-R. Jeong, C. Kim, and J. So, "The dark side of the Internet: Attacks, costs and responses," Inf. Syst., vol. 36, no. 3, pp. 675–705, 2011.
T. Wu, S. Wen, Y. Xiang, and W. Zhou, "Twitter spam detection: Survey of new approaches and comparative study," Comput. Secur., vol. 76, pp. 265–284, 2018.
M. Washha, A. Qaroush, M. Mezghani, and F. Sedes, "Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model," Expert Syst. Appl., vol. 135, pp. 129–152, 2019.
S. M. Ahmad, Spam Classification Using Machine Learning and Deep Learning (Doctoral dissertation). Dublin Business School, 2024.
C. Rudin, C. Chen, Z. Chen, H. Huang, L. Semenova, and C. Zhong, "Interpretable machine learning: Fundamental principles and 10 grand challenges," Stat. Surv., vol. 16, 2022.
M. Fazil and M. Abulaish, "A hybrid approach for detecting automated spammers in Twitter," IEEE Trans. Inf. Forensics Secur., vol. 13, no. 11, pp. 2707–2719, 2018.
Y. Mourtaji, M. Bouhorma, D. Alghazzawi, G. Aldabbagh, and A. Alghamdi, "Hybrid rule-based solution for phishing URL detection using convolutional neural network," Wirel. Commun. Mob. Comput., vol. 2021, pp. 1–24, 2021.
W. Hu, Q. Cao, M. Darbandi, and N. Jafari Navimipour, "A deep analysis of nature-inspired and meta-heuristic algorithms for designing intrusion detection systems in cloud/edge and IoT: State-of-the-art techniques, challenges, and future directions," Cluster Comput., vol. 27, no. 7, pp. 8789–8815, 2024.
E. Dzeha, The IntelliTweet: Unveiling Malicious Activities in Tweets through a Multifaceted Feature Analysis (Doctoral dissertation), 2024.
N. Thakur, "A large-scale dataset of Twitter chatter about online learning during the current COVID-19 Omicron wave," Data (Basel), vol. 7, no. 8, p. 109, 2022.
N. Ahmed, R. Amin, H. Aldabbas, D. Koundal, B. Alouffi, and T. Shah, "Machine learning techniques for spam detection in email and IoT platforms: Analysis and research challenges," Security and Communication Networks, vol. 2022, 2022.
S. B. Abkenar, M. H. Kashani, M. Akbari, and E. Mahdipour, "Learning textual features for Twitter spam detection: A systematic literature review," Expert Syst. Appl., vol. 228, p. 120366, 2023.
A. P. Rodrigues et al., "Real-time Twitter spam detection and sentiment analysis using machine learning and deep learning techniques," Comput. Intell. Neurosci., vol. 2022, p. 5211949, 2022.
A. Talha and R. Kara, "A survey of spam detection methods on Twitter," Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 3, 2017.
A. Redhu, P. Choudhary, K. Srinivasan, and T. K. Das, "Deep learning-powered malware detection in cyberspace: A contemporary review," Front. Phys., vol. 12, 2024.
J. Rane, S. K. Mallick, O. Kaya, and N. L. Rane, "Scalable and adaptive deep learning algorithms for large-scale machine learning systems," Future Res. Opportunities Artif. Intell. Ind., vol. 5, pp. 2–40, 2024.
R. Aswani, A. K. Kar, and P. V. Ilavarasan, "Detection of spammers in Twitter marketing: A hybrid approach using social media analytics and bio-inspired computing," Inf. Syst. Front., vol. 20, no. 3, pp. 515–530, 2018.
A. F. Elsaid, R. M. Fahmi, N. Shehta, and B. M. Ramadan, "Machine learning approach for hemorrhagic transformation prediction: Capturing predictors’ interaction," Front. Neurol., vol. 13, p. 951401, 2022.
C. M. R. Da Silva, E. L. Feitosa, and V. C. Garcia, "Heuristic-based strategy for phishing prediction: A survey of URL-based approach," Comput. Secur., vol. 88, 2020.
D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, 2nd ed. Wiley-Interscience, 2000.
C. Cortes and V. Vapnik, "Support-vector networks," Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
J. R. Quinlan, "Induction of decision trees," Mach. Learn., vol. 1, no. 1