Speech-Based Emotion Recognition and PTSD Detection through Machine and Deep Learning

Main Article Content

K Islam
Z ElSayed

Abstract

This study investigates the potential of machine and deep learning algorithms for Speech Emotion Recognition (SER) and Post-Traumatic Stress Disorder (PTSD) detection through speech analysis. Traditional diagnostic methods for PTSD, which are often subjective and time-consuming, are in contrast with the automated capabilities offered by these algorithms, enabling early detection through the identification of specific speech patterns. Utilizing the RAVDESS Emotional Speech Audio dataset alongside PTSD-specific recordings, this study applies preprocessing techniques such as noise reduction and normalization to enhance the quality of the speech data. Feature extraction is performed by focusing on acoustic, linguistic, and temporal features that capture variations in the pitch, intonation, and speech rate. Both machine learning models, including Support Vector Machines (SVMs) and Random Forests, and deep learning models, such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, have been developed and compared. Experimental results indicate that deep learning models achieve up to 91% accuracy in SER and 89% accuracy in PTSD detection, significantly outperforming traditional machine learning methods. These findings demonstrate the efficacy of multimodal integration in improving diagnostic capabilities, particularly through a combination of speech, text, and physiological data. However, the study acknowledges the limitations in generalizability across diverse populations and the practical challenges of deploying these models in real-world applications. Future work will focus on expanding the datasets to include a wider range of demographic and cultural variations, enhancing real-time monitoring capabilities, and refining model interpretability to ensure reliable performance in various contexts.

Article Details

How to Cite
[1]
K Islam and Z ElSayed, “Speech-Based Emotion Recognition and PTSD Detection through Machine and Deep Learning”, Int. J. Comput. Eng. Res. Trends, vol. 11, no. 3, pp. 46–53, Mar. 2024.
Section
Research Articles

References

] Schultebraucks, K., Yadav, V., Shalev, A. Y., Bonanno, G. A., & Galatzer-Levy, I. R. (2022). Deep learning-based classification of posttraumatic stress disorder and depression following trauma utilizing visual and auditory markers of arousal and mood. Psychological Medicine, 52(5), 957-967.

] Suneetha, C., & Anitha, R. (2022). A Survey Of Machine Learning Techniques OnSpeech Based Emotion Recognition And Post Traumatic Stress DisorderDetection. Neuroquantology, 20(14), 69.

] Suneetha, C., & Anitha, R. (2023). Enhanced Speech Emotion Recognition Using the Cognitive Emotion Fusion Network for PTSD Detection with a Novel Hybrid Approach. Journal of Electrical Systems, 19(4).

] Shoumy, N. J., Ang, L. M., Seng, K. P., Rahaman, D. M., & Zia, T. (2020). Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals. Journal of Network and Computer Applications, 149, 102447.

] Muzammel, M., Salam, H., & Othmani, A. (2021). End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis. Computer Methods and Programs in Biomedicine, 211, 106433.

] Othmani, A., Brahem, B., & Haddou, Y. (2023). Machine learning-based approaches for post-traumatic stress disorder diagnosis using video and eeg sensors: A review. IEEE Sensors Journal.

] Shoumy, N. J. (2022). Multimodal emotion recognition using data augmentation and fusion.

] Hasnul, M. A., Aziz, N. A. A., Alelyani, S., Mohana, M., & Aziz, A. A. (2021). Electrocardiogram-based emotion recognition systems and their applications in healthcare—a review. Sensors, 21(15), 5015.

] Kuttala, R., Subramanian, R., & Oruganti, V. R. M. (2023). Multimodal hierarchical cnn feature fusion for stress detection. IEEE Access, 11, 6867-6878.

] Toto, E., Tlachac, M. L., & Rundensteiner, E. A. (2021, October). Audibert: A deep transfer learning multimodal classification framework for depression screening. In Proceedings of the 30th ACM international conference on information & knowledge management (pp. 4145-4154).

] Caulley, D., Alemu, Y., Burson, S., Bautista, E. C., Tadesse, G. A., Kottmyer, C., ... & Sezgin, E. (2023). Objectively quantifying pediatric psychiatric severity using artificial intelligence, voice recognition technology, and universal emotions: pilot study for artificial intelligence-enabled innovation to address youth mental health crisis. JMIR research protocols, 12(1), e51912.

] Rituerto González, E. (2022). Multimodal Affective Computing in Wearable Devices with Applications in the Detection of Gender-based Violence (Doctoral dissertation).

] Ramos-Lima, L. F., Waikamp, V., Antonelli-Salgado, T., Passos, I. C., & Freitas, L. H. M. (2020). The use of machine learning techniques in trauma-related disorders: a systematic review. Journal of psychiatric research, 121, 159-172.

] Ismail, N. H. B. (2020). Deep Learning with Multimodal Data for Healthcare (Doctoral dissertation, Texas A&M University).

] Schultebraucks, K., Yadav, V., & Galatzer-Levy, I. R. (2021). Utilization of machine learning-based computer vision and voice analysis to derive digital biomarkers of cognitive functioning in trauma survivors. Digital biomarkers, 5(1), 16-23.

] Madhavi, I., Chamishka, S., Nawaratne, R., Nanayakkara, V., Alahakoon, D., & De Silva, D. (2020, September). A deep learning approach for work related stress detection from audio streams in cyber physical environments. In 2020 25th IEEE international conference on emerging technologies and factory automation (ETFA) (Vol. 1, pp. 929-936). IEEE.

] TJ, S. J., Jacob, I. J., & Mandava, A. K. (2023). D-ResNet-PVKELM: deep neural network and paragraph vector based kernel extreme machine learning model for multimodal depression analysis. Multimedia Tools and Applications, 82(17), 25973-26004.

] Livingstone, S. R., & Russo, F. A. (2018). RAVDESS Emotional Speech Audio [Data set]. Kaggle. https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio