A comparative analysis of machine learning algorithms for hate speech detection in social media

Esraa Omran; Estabraq Al Tararwah; Jamal Al Qundus

doi:10.30935/ojcmt/13603

Research Article

A comparative analysis of machine learning algorithms for hate speech detection in social media

Esraa Omran ¹ ^* , Estabraq Al Tararwah ² , Jamal Al Qundus ³

More Detail

¹ Center for Applied Mathematics and Bioinformatics, Department of Computer Science, Gulf University for Science and Technology, Kuwait City, KUWAIT² Gulf University for Science and Technology, Kuwait City, KUWAIT³ Faculty of Information Technology, Middle East University, Amman, JORDAN^* Corresponding Author

Online Journal of Communication and Media Technologies, 13(4), October 2023, e202348, https://doi.org/10.30935/ojcmt/13603

Published Online: 22 August 2023, Published: 01 October 2023

OPEN ACCESS 6986 Views 7803 Downloads

Download Full Text (PDF)

ABSTRACT

A detecting and mitigating hate speech in social media, particularly on platforms like Twitter, is a crucial task with significant societal impact. This research study presents a comprehensive comparative analysis of machine learning algorithms for hate speech detection, with the primary goal of identifying an optimal algorithmic combination that is simple, easy to implement, efficient, and yields high detection performance. Through meticulous pre-processing and rigorous evaluation, the study explores various algorithms to determine their suitability for hate speech detection. The focus is finding a combination that balances simplicity, ease of implementation, computational efficiency, and strong performance metrics. The findings reveal that the combination of naïve Bayes and decision tree algorithms achieves a high accuracy of 0.887 and an F1-score of 0.885, demonstrating its effectiveness in hate speech detection. This research contributes to identifying a reliable algorithmic combination that meets the criteria of simplicity, ease of implementation, quick processing, and strong performance, providing valuable guidance for researchers and practitioners in hate speech detection in social media. By elucidating the strengths and limitations of various algorithmic combinations, this research enhances the understanding of hate speech detection. It paves the way for developing robust solutions, creating a safer, more inclusive digital environment.

Keywords: hate speech detection, machine learning, social media analysis, text classification

CITATION (APA)

Omran, E., Al Tararwah, E., & Al Qundus, J. (2023). A comparative analysis of machine learning algorithms for hate speech detection in social media. Online Journal of Communication and Media Technologies, 13(4), e202348. https://doi.org/10.30935/ojcmt/13603

REFERENCES

Anand, M., Sahay, K. B., Ahmed, M. A., Sultan, D., Chandan, R. R., 6 Singh, B. (2023). Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques. Theoretical Computer Science, 943, 203-218. https://doi.org/10.1016/j.tcs.2022.06.020
Bansal, M., Goyal, A., & Choudhary, A. (2022). A comparative analysis of k-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decision Analytics Journal, 3, 100071. https://doi.org/10.1016/j.dajour.2022.100071
Connolly, T. M., & Begg, C. E. (2005). Database systems: A practical approach to design, implementation, and management. Pearson Education.
Das, S., Bhattacharyya, K., & Sarkar, S. (2023). Performance analysis of logistic regression, naïve Bayes, KNN, decision tree, random forest and SVM on hate speech detection from Twitter. International Research Journal of Innovations in Engineering and Technology, 7(3), 24-28.
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512-515. https://doi.org/10.1609/icwsm.v11i1.14955
DeepAI. (2019). Feature extraction. DeepAI. https://deepai.org/machine-learning-glossary-and-terms/feature-extraction
del Pilar Salas-Zárate, M., Alor-Hernández, G., Sánchez-Cervantes, J. L., Paredes-Valverde, M. A., García-Alcaraz, J. L., & Valencia-García, R. (2020). Review of English literature on figurative language applied to social networks. Knowledge and Information Systems, 62(6), 2105-2137. https://doi.org/10.1007/s10115-019-01425-3
Elzayady, H., Mohamed, M. S., Badran, K. M., & Salama, G. I. (2023). A hybrid approach based on personality traits for hate speech detection in Arabic social media. International Journal of Electrical and Computer Engineering, 13(2), 1979-1988. https://doi.org/10.11591/ijece.v13i2.pp1979-1988
Kebede, S., & Tveiten, O. (2023). Ethnicity as journalism paradigm: Polarization and political parallelism of Ethiopian news in transition. Online Journal of Communication and Media Technologies, 13(3), e202335. https://doi.org/10.30935/ojcmt/13333
Kent State University. (2022). Negative effects of cyberbullying. Kent State University. https://onlinedegrees.kent.edu/sociology/criminaljustice/community/negative-effects-of-cyberbullying
Kindermann, D. (2023). Against ‘hate speech’. Journal of Applied Philosophy. https://doi.org/10.1111/japp.12648
Laub, Z. (2019). Hate speech on social media: Global comparisons. Council on Foreign Relations. https://www.cfr.org/backgrounder/hate-speech-social-media-globalcomparisons
Mazari, A. C., & Kheddar, H. (2023). Deep learning-based analysis of Algerian dialect dataset targeted hate speech, offensive language and cyberbullying. International Journal of Computing and Digital Systems, 13(1), 965-972. https://doi.org/10.12785/ijcds/130177
Okpara, S. M. N. (2023). Smartphone addiction avoidance via inherent ethical mechanisms and influence on academic performance. Online Journal of Communication and Media Technologies, 13(2), e202318. https://doi.org/10.30935/ojcmt/13020
Parker, S., & Ruths, D. (2023). Is hate speech detection the solution the world wants? Proceedings of the National Academy of Sciences, 120(10), e2209384120. https://doi.org/10.1073/pnas.2209384120
Paul, C., & Bora, P. (2021). Detecting hate speech using deep learning techniques. International Journal of Advanced Computer Science and Applications, 12(2). https://doi.org/10.14569/ijacsa.2021.0120278
Ray, S. (2017). Naïve Bayes classifier explained: Applications and practice problems of naïve Bayes classifier. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/09/naïve-bayesexplained/
Saleh, H., Alhothali, A., & Moria, K. (2023). Detection of hate speech using BERT and hate speech word embedding with deep model. Applied Artificial Intelligence, 37(1), 2166719. https://doi.org/10.1080/08839514.2023.2166719
Samoshyn, A. (2020). Hate speech and offensive language dataset. Kaggle. https://www.kaggle.com/datasets/mrmorj/hate-speechand-offensive-language-dataset
Simon, H., Baha, B. Y., & Garba, E. J. (2022). Trends in machine learning on automatic detection of hate speech on social media platforms: A systematic review. FUW Trends in Science & Technology Journal, 7(1), 001-016.
Sinyangwe, C., Kunda, D., & Abwino, W. P. (2023). Detecting hate speech and offensive language using machine learning in published online content. Zambia ICT Journal, 7(1), 79-84. https://doi.org/10.33260/zictjournal.v7i1.143
Sultan, D., Toktarova, A., Zhumadillayeva, A., Aldeshov, S., Mussiraliyeva, S., Beissenova, G., Tursynbayev, A., Baenova, G., & Imanbayeva, A. (2023). Cyberbullying-related hate speech detection using shallow-to-deep learning. Computers, Materials & Continua, 75(1), 2115-2131. https://doi.org/10.32604/cmc.2023.032993
Toktarova, A., Syrlybay, D., Myrzakhmetova, B., Anuarbekova, G., Rakhimbayeva, G., Zhylanbaeva, B., Suieuova, N., & Kerimbekov, M. (2023). Hate speech detection in social networks using machine learning and deep learning methods. International Journal of Advanced Computer Science and Applications, 14(5), 396-406. https://doi.org/10.14569/IJACSA.2023.0140542
United Nations. (2023). What is hate speech? United Nations. https://www.un.org/en/hate-speech/understanding-hate-speech/what-ishate-speech
Yadav, A. K., Kumar, M., Kumar, A., Shivani, Kusum, & Yadav, D. (2023a). Hate speech recognition in multilingual text: Hinglish documents. International Journal of Information Technology, 15, 1319-1331. https://doi.org/10.1007/s41870-023-01211-z
Yadav, D., Sain, M. K., & Raj B, A. A. (2023b). Comparative analysis and assessment on different hate speech detection learning techniques. Journal of Algebraic Statistics, 14(1), 29-48.

Journal Details

Founded In: 2011

Published: Quarterly

Language: English

APC: €1250

Indexed in ESCI & SCOPUS

Impact Factor (IF): 1.06 (2024)

JCR Category : Q2

JCI : 1.02 (2025), 1.07 (2024)

CiteScore 2025 : 5.2

Submit Now

[1] Anand, M., Sahay, K. B., Ahmed, M. A., Sultan, D., Chandan, R. R., 6 Singh, B. (2023). Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques. Theoretical Computer Science, 943, 203-218. https://doi.org/10.1016/j.tcs.2022.06.020

[2] Bansal, M., Goyal, A., & Choudhary, A. (2022). A comparative analysis of k-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decision Analytics Journal, 3, 100071. https://doi.org/10.1016/j.dajour.2022.100071

[3] Connolly, T. M., & Begg, C. E. (2005). Database systems: A practical approach to design, implementation, and management. Pearson Education.

[4] Das, S., Bhattacharyya, K., & Sarkar, S. (2023). Performance analysis of logistic regression, naïve Bayes, KNN, decision tree, random forest and SVM on hate speech detection from Twitter. International Research Journal of Innovations in Engineering and Technology, 7(3), 24-28.

[5] Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512-515. https://doi.org/10.1609/icwsm.v11i1.14955

[6] DeepAI. (2019). Feature extraction. DeepAI. https://deepai.org/machine-learning-glossary-and-terms/feature-extraction

[7] del Pilar Salas-Zárate, M., Alor-Hernández, G., Sánchez-Cervantes, J. L., Paredes-Valverde, M. A., García-Alcaraz, J. L., & Valencia-García, R. (2020). Review of English literature on figurative language applied to social networks. Knowledge and Information Systems, 62(6), 2105-2137. https://doi.org/10.1007/s10115-019-01425-3

[8] Elzayady, H., Mohamed, M. S., Badran, K. M., & Salama, G. I. (2023). A hybrid approach based on personality traits for hate speech detection in Arabic social media. International Journal of Electrical and Computer Engineering, 13(2), 1979-1988. https://doi.org/10.11591/ijece.v13i2.pp1979-1988

[9] Kebede, S., & Tveiten, O. (2023). Ethnicity as journalism paradigm: Polarization and political parallelism of Ethiopian news in transition. Online Journal of Communication and Media Technologies, 13(3), e202335. https://doi.org/10.30935/ojcmt/13333

[10] Kent State University. (2022). Negative effects of cyberbullying. Kent State University. https://onlinedegrees.kent.edu/sociology/criminaljustice/community/negative-effects-of-cyberbullying

[11] Kindermann, D. (2023). Against ‘hate speech’. Journal of Applied Philosophy. https://doi.org/10.1111/japp.12648

[12] Laub, Z. (2019). Hate speech on social media: Global comparisons. Council on Foreign Relations. https://www.cfr.org/backgrounder/hate-speech-social-media-globalcomparisons

[13] Mazari, A. C., & Kheddar, H. (2023). Deep learning-based analysis of Algerian dialect dataset targeted hate speech, offensive language and cyberbullying. International Journal of Computing and Digital Systems, 13(1), 965-972. https://doi.org/10.12785/ijcds/130177

[14] Okpara, S. M. N. (2023). Smartphone addiction avoidance via inherent ethical mechanisms and influence on academic performance. Online Journal of Communication and Media Technologies, 13(2), e202318. https://doi.org/10.30935/ojcmt/13020

[15] Parker, S., & Ruths, D. (2023). Is hate speech detection the solution the world wants? Proceedings of the National Academy of Sciences, 120(10), e2209384120. https://doi.org/10.1073/pnas.2209384120

[16] Paul, C., & Bora, P. (2021). Detecting hate speech using deep learning techniques. International Journal of Advanced Computer Science and Applications, 12(2). https://doi.org/10.14569/ijacsa.2021.0120278

[17] Ray, S. (2017). Naïve Bayes classifier explained: Applications and practice problems of naïve Bayes classifier. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/09/naïve-bayesexplained/

[18] Saleh, H., Alhothali, A., & Moria, K. (2023). Detection of hate speech using BERT and hate speech word embedding with deep model. Applied Artificial Intelligence, 37(1), 2166719. https://doi.org/10.1080/08839514.2023.2166719

[19] Samoshyn, A. (2020). Hate speech and offensive language dataset. Kaggle. https://www.kaggle.com/datasets/mrmorj/hate-speechand-offensive-language-dataset

[20] Simon, H., Baha, B. Y., & Garba, E. J. (2022). Trends in machine learning on automatic detection of hate speech on social media platforms: A systematic review. FUW Trends in Science & Technology Journal, 7(1), 001-016.

[21] Sinyangwe, C., Kunda, D., & Abwino, W. P. (2023). Detecting hate speech and offensive language using machine learning in published online content. Zambia ICT Journal, 7(1), 79-84. https://doi.org/10.33260/zictjournal.v7i1.143

[22] Sultan, D., Toktarova, A., Zhumadillayeva, A., Aldeshov, S., Mussiraliyeva, S., Beissenova, G., Tursynbayev, A., Baenova, G., & Imanbayeva, A. (2023). Cyberbullying-related hate speech detection using shallow-to-deep learning. Computers, Materials & Continua, 75(1), 2115-2131. https://doi.org/10.32604/cmc.2023.032993

[23] Toktarova, A., Syrlybay, D., Myrzakhmetova, B., Anuarbekova, G., Rakhimbayeva, G., Zhylanbaeva, B., Suieuova, N., & Kerimbekov, M. (2023). Hate speech detection in social networks using machine learning and deep learning methods. International Journal of Advanced Computer Science and Applications, 14(5), 396-406. https://doi.org/10.14569/IJACSA.2023.0140542

[24] United Nations. (2023). What is hate speech? United Nations. https://www.un.org/en/hate-speech/understanding-hate-speech/what-ishate-speech

[25] Yadav, A. K., Kumar, M., Kumar, A., Shivani, Kusum, & Yadav, D. (2023a). Hate speech recognition in multilingual text: Hinglish documents. International Journal of Information Technology, 15, 1319-1331. https://doi.org/10.1007/s41870-023-01211-z

[26] Yadav, D., Sain, M. K., & Raj B, A. A. (2023b). Comparative analysis and assessment on different hate speech detection learning techniques. Journal of Algebraic Statistics, 14(1), 29-48.