Twitter Sentiment Analysis: A Comparative Evaluation of Linear and Tree-Based Methods

Authors

  • Ali Ahmed Superior University, Department of Computer Science and Information Technology, Lahore, Pakistan
  • Dr. Jawad Ahmed Faculty of Computer Science and Information Technology Superior University, Lahore, Pakistan
  • Dr. Saleem Mustafa Faculty of Computer Science and Information Technology Superior University, Lahore, Pakistan

Keywords:

Twitter sentiment analysis, Sentiment 140 dataset, Logistic Regression, LightGBM, Random Forest, TF-IDF, computational scalability

Abstract

Twitter sentiment analysis faces different challenges from noisy text, high-dimensional data, and computational requirements. This study evaluates three Machine Learning Models – Logistic Regression (LR), LightGBM (LGBM), and Random Forest (RF) – on the Sentiment140 dataset (1.6 million tweets) to identify optimized approaches for large-scale sentiment classification. A robust and thorough preprocessing pipeline, including text cleaning, stemming, and TF-IDF vectorization, was applied to address noisy data that was mostly linguistic noise that was inherent in social media content. Stratified sampling ensured balanced training and testing data splits.

Results showed that LR Model achieved the highest test accuracy score (77.67%), outperforming LGBM (76.98%), despite LGBM’s superior probabilistic calibration (log loss: 0.4837). RF failed to complete training within 8 hours due to computation inefficiency with high-dimensional TF-IDF features, highlighting its impracticality for large text datasets with high-dimensional data. The findings underscore that linear models like LR excel in sparse, high-dimensional spaces, while gradient-boosted trees (LGBM) require careful hyper-parameter tuning to balance speed and accuracy.

This study emphasizes the importance of model selection based on task priorities. LR for interpretability and LGBM for probabilistic reliability. RF’s failure illustrates the critical role of scalability in real-world NLP applications. Practical implications suggest that simpler models can rival complex ensembles in text classification, reducing computational costs. Future work should explore hybrid approaches, hyper-parameter optimization, and transformer-based embedding (e.g., BERT) to enhance performance. The methodology provides a reproducible framework for efficient sentiment analysis, guiding researchers and practitioners in balancing accuracy, speed and resource constraints.

References

Ahmad, N. R. (2025). Business ethics in the age of automation: How companies can balance profitability with responsibility. Bulletin of Management Review.

Ahmad, N. R. (2025). Blockchain beyond buzzwords: Evaluating its practical application in Pakistan’s supply chain systems. Naveed Rafaqat Ahmad.

Safdar, M. R. (2025). Punjab Sahulat Bazaars Authority: Pakistan’s only statutory body transitioned from company to authority, with Naveed Rafaqat as the force behind this distinguished milestone. https://doi.org/10.47067/real.v7i4.378

Hussain, T. (2025). Welfare without fiscal burden: Designing market-based public institutions in developing economies. Al-Aasar, 2(3), 332–344. https://doi.org/10.63878/aaj875

Idrees, Z. (2025). Achieving financial sustainability without subsidies: Viable business models for public sector innovation. https://doi.org/10.63878/cjssr.v3i2.1335

Hassan, F. U. (2025). Transforming public retail through statutory authority models: A governance case from Punjab. Al-Aasar, 2(2), 1360–1373. https://doi.org/10.63878/aaj874

Shehzad, K. (2025). Local governance synergies and public infrastructure: Advancing retail reform through institutional backing. https://doi.org/10.63878/cjssr.v3i4.1345

Khan, N. (2025). Gender inclusion and public access: Empowering women through retail governance interventions. https://doi.org/10.47067/real.v8i2.436

Ahmad, N. R. (2025). Digital transformation and competitive advantage: Leveraging AI in emerging market supply chains. Naveed Rafaqat Ahmad.

Ahmad, N. R. (2025). Exploring the blockchain technology adoption in financial services and its impact on operational efficiency and trust. Naveed Rafaqat Ahmad.

Ahmad, N. R. (2025). The impact of fintech startups on financial innovation and stability in Pakistan’s evolving financial landscape. Naveed Rafaqat Ahmad.

Hassan, F. U. (2025). Revolutionizing public retail: How Punjab Sahulat Bazaars Authority set new standards in Pakistan. https://doi.org/10.63878/cjssr.v2i04.1325

Ahmad, N. R. (2025). Leadership styles in the 21st century: A comparative analysis of transformational and transactional leadership. Journal for Social Science Archives, 3(1), 576–587. https://doi.org/10.59075/jssa.v3i1.142

Ahmad, N. R. (2025). Corporate social responsibility in the digital age: Navigating ethical marketing and consumer expectations. Research Journal of Psychology, 3(1), 287–297. https://doi.org/10.59075/rjs.v3i1.66

Ahmad, N. R. (2025). The role of neuromarketing in shaping consumer behavior: How businesses are using science to drive sales. Naveed Rafaqat Ahmad

Ahmad, N. R. (2025). Exploring diversity and inclusion in business: Unlocking the power of diverse teams. Naveed Rafaqat Ahmad

Ahmad, N. R. (2025). Exploring the impact of inflation on Pakistani society: Challenges, causes, and long-term consequences for economic stability and social well-being. Naveed Rafaqat Ahmad. https://doi.org/10.63075/7vtnh777

Ahmad, N. R. (2025). Model bazaars redefined: Punjab’s visionary step to authority status for public welfare. Naveed Rafaqat Ahmad

Ahmad, N. R. (2025). Financial inclusion: How digital banking is bridging the gap for emerging markets. Naveed Rafaqat Ahmad

Ahmad, N. R. (2025). Exploring the relationship between leadership styles and employee motivation in remote work environments. Naveed Rafaqat Ahmad

Ahmad, N. R. (2025). Exploring the role of digital technologies in enhancing supply chain efficiency: A case study of e-commerce companies. Indus Journal of Social Sciences, 3(1), 226–237. https://doi.org/10.59075/ijss.v3i1.618

Hussain, T. (2025). Redefining affordability: Evidence from Punjab Model Bazaars under Naveed Rafaqat’s leadership in supplying edibles below government-notified rates. https://doi.org/10.63878/cjssr.v3i3.1310

Downloads

Published

2025-09-30