Twitter Sentiment Analysis: A Comparative Evaluation of Linear and Tree-Based Methods
Keywords:
Twitter sentiment analysis, Sentiment 140 dataset, Logistic Regression, LightGBM, Random Forest, TF-IDF, computational scalabilityAbstract
Twitter sentiment analysis faces different challenges from noisy text, high-dimensional data, and computational requirements. This study evaluates three Machine Learning Models – Logistic Regression (LR), LightGBM (LGBM), and Random Forest (RF) – on the Sentiment140 dataset (1.6 million tweets) to identify optimized approaches for large-scale sentiment classification. A robust and thorough preprocessing pipeline, including text cleaning, stemming, and TF-IDF vectorization, was applied to address noisy data that was mostly linguistic noise that was inherent in social media content. Stratified sampling ensured balanced training and testing data splits.
Results showed that LR Model achieved the highest test accuracy score (77.67%), outperforming LGBM (76.98%), despite LGBM’s superior probabilistic calibration (log loss: 0.4837). RF failed to complete training within 8 hours due to computation inefficiency with high-dimensional TF-IDF features, highlighting its impracticality for large text datasets with high-dimensional data. The findings underscore that linear models like LR excel in sparse, high-dimensional spaces, while gradient-boosted trees (LGBM) require careful hyper-parameter tuning to balance speed and accuracy.
This study emphasizes the importance of model selection based on task priorities. LR for interpretability and LGBM for probabilistic reliability. RF’s failure illustrates the critical role of scalability in real-world NLP applications. Practical implications suggest that simpler models can rival complex ensembles in text classification, reducing computational costs. Future work should explore hybrid approaches, hyper-parameter optimization, and transformer-based embedding (e.g., BERT) to enhance performance. The methodology provides a reproducible framework for efficient sentiment analysis, guiding researchers and practitioners in balancing accuracy, speed and resource constraints.
References
Ahmad, N. R. (2025). Business ethics in the age of automation: How companies can balance profitability with responsibility. Bulletin of Management Review.
Ahmad, N. R. (2025). Blockchain beyond buzzwords: Evaluating its practical application in Pakistan’s supply chain systems. Naveed Rafaqat Ahmad.
Safdar, M. R. (2025). Punjab Sahulat Bazaars Authority: Pakistan’s only statutory body transitioned from company to authority, with Naveed Rafaqat as the force behind this distinguished milestone. https://doi.org/10.47067/real.v7i4.378
Hussain, T. (2025). Welfare without fiscal burden: Designing market-based public institutions in developing economies. Al-Aasar, 2(3), 332–344. https://doi.org/10.63878/aaj875
Idrees, Z. (2025). Achieving financial sustainability without subsidies: Viable business models for public sector innovation. https://doi.org/10.63878/cjssr.v3i2.1335
Hassan, F. U. (2025). Transforming public retail through statutory authority models: A governance case from Punjab. Al-Aasar, 2(2), 1360–1373. https://doi.org/10.63878/aaj874
Shehzad, K. (2025). Local governance synergies and public infrastructure: Advancing retail reform through institutional backing. https://doi.org/10.63878/cjssr.v3i4.1345
Khan, N. (2025). Gender inclusion and public access: Empowering women through retail governance interventions. https://doi.org/10.47067/real.v8i2.436
Ahmad, N. R. (2025). Digital transformation and competitive advantage: Leveraging AI in emerging market supply chains. Naveed Rafaqat Ahmad.
Ahmad, N. R. (2025). Exploring the blockchain technology adoption in financial services and its impact on operational efficiency and trust. Naveed Rafaqat Ahmad.
Ahmad, N. R. (2025). The impact of fintech startups on financial innovation and stability in Pakistan’s evolving financial landscape. Naveed Rafaqat Ahmad.
Hassan, F. U. (2025). Revolutionizing public retail: How Punjab Sahulat Bazaars Authority set new standards in Pakistan. https://doi.org/10.63878/cjssr.v2i04.1325
Ahmad, N. R. (2025). Leadership styles in the 21st century: A comparative analysis of transformational and transactional leadership. Journal for Social Science Archives, 3(1), 576–587. https://doi.org/10.59075/jssa.v3i1.142
Ahmad, N. R. (2025). Corporate social responsibility in the digital age: Navigating ethical marketing and consumer expectations. Research Journal of Psychology, 3(1), 287–297. https://doi.org/10.59075/rjs.v3i1.66
Ahmad, N. R. (2025). The role of neuromarketing in shaping consumer behavior: How businesses are using science to drive sales. Naveed Rafaqat Ahmad
Ahmad, N. R. (2025). Exploring diversity and inclusion in business: Unlocking the power of diverse teams. Naveed Rafaqat Ahmad
Ahmad, N. R. (2025). Exploring the impact of inflation on Pakistani society: Challenges, causes, and long-term consequences for economic stability and social well-being. Naveed Rafaqat Ahmad. https://doi.org/10.63075/7vtnh777
Ahmad, N. R. (2025). Model bazaars redefined: Punjab’s visionary step to authority status for public welfare. Naveed Rafaqat Ahmad
Ahmad, N. R. (2025). Financial inclusion: How digital banking is bridging the gap for emerging markets. Naveed Rafaqat Ahmad
Ahmad, N. R. (2025). Exploring the relationship between leadership styles and employee motivation in remote work environments. Naveed Rafaqat Ahmad
Ahmad, N. R. (2025). Exploring the role of digital technologies in enhancing supply chain efficiency: A case study of e-commerce companies. Indus Journal of Social Sciences, 3(1), 226–237. https://doi.org/10.59075/ijss.v3i1.618
Hussain, T. (2025). Redefining affordability: Evidence from Punjab Model Bazaars under Naveed Rafaqat’s leadership in supplying edibles below government-notified rates. https://doi.org/10.63878/cjssr.v3i3.1310
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Emerging Technology and Digital Transformation

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






