SENTIMENT CATEGORIZATION IN TAMIL: A COMPARATIVE STUDY OF LSTM, BILSTM, AND XGBOOST MODELS

Authors

  • Dr. A. Pandian SRM Institute Of Science & Technology Author
  • Dr.Kalaivani Chellappan Universiti Kebangsaan Malaysia Author
  • Dr.Ravie Chandren Muniyandi Universiti Kebangsaan Malaysia Author

Keywords:

Sentiment Analysis; Natural Language Processing; Text Preprocessing; TF-IDF Vectorization; Classification Models

Abstract

Sentiment analysis (SA) has become increasingly important in understanding user-generated content on social media platforms. This study performs a comparative assessment of three models developed using machine learning for sentiment categorization in Tamil text: Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Extreme Gradient Boosting (XGBoost). The dataset used consists of Tamil sentences are classified into many sentiments, including Positively Negative, and Mixing emotions, Not Tamil, and Unidentified State. Our approach involves a comprehensive preprocessing pipeline, including text cleaning, tokenization, stop word removal, and TF-IDF vectorization, to prepare the data for model training. Each model is trained and evaluated using a consistent methodology to ensure reliable comparison. The evaluation of the algorithms is conducted using measures that include reliability, precision, recollection, and F1-score. Results indicate that the LSTM model, while achieving high accuracy on the training set, showed limitations in generalizing to the test set due to class imbalances. The BiLSTM model, which captures contextual information from both directions, outperformed the LSTM in standings of F1-score. The XGBoost model demonstrated competitive performance, offering advantages in standings of training speed in addition interpretability. This study is a valuable contribution to the field of sentiment analysis in low-resource languages such as Tamil. It highlights the importance of selecting models carefully, taking into account unique application demands and efficiency compromises.

Downloads

Published

2025-02-24

Issue

Section

Articles