Home
Blog
Data Science
Social Media Sentiment Analysis with Machine Learning Techniques

Social Media Sentiment Analysis with Machine Learning Techniques

Updated on Jul 30, 2025 | 8 min read | 1.98K+ views

Table of Contents

View all

What Should You Know to Build This Project Successfully?
Behind the Scenes: Tools That Power Social Media Sentiment Analysis
How Long Will It Take and What Can You Expect?
Smart Insights: Techniques That Power Social Media Sentiment Analysis
How to Build a Social Media Sentiment Analysis Model
Final Conclusion

Understanding how people feel about a topic in real time can shape products, politics, and public opinion.

In this project, you’ll perform social media sentiment analysis using real-world posts. You’ll clean raw text data, extract meaningful features, and train powerful models like Naïve Bayes and SVM to classify sentiments as positive, negative, or neutral.

Accelerate your data science career with upGrad’s top-rated Online Data Science Courses. Learn Python, Machine Learning, AI, SQL, Tableau, and more, taught by industry experts. Build real-world skills and get job-ready. Start learning today!

Turn your ideas into real-world skills. Dive into our top Python Data Science Projects and start building today.

Popular Data Science Programs

Cloud Computing Courses Certification Data Science Machine Learning Course Data Science Advanced Course M Sc in Data Science Degree PGD in Data Science

What Should You Know to Build This Project Successfully?

Before starting your Social media sentiment analysis project, it’s important to be familiar with these key concepts and tools:

Python programming (You’ll use Python throughout for data processing, visualization, and modeling.)
Pandas and Numpy (These libraries help you handle time series data, perform calculations, and structure your dataset for modeling.)
Matplotlib or Seaborn (visualizing sentiment distributions and trends)
Scikit‑learn basics (training classifiers like Naïve Bayes or SVM, making predictions, and evaluating models using accuracy, precision, recall, and F1 score)
Intro to NLP concepts like tokenization, stopword removal, stemming, and vectorization (TF-IDF or word embeddings)
Optional: Familiarity with deep learning basics (if you want to implement LSTM using TensorFlow or Keras)

Also Read: 15+ Top Natural Language Processing Techniques To Learn in 2025

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:

Behind the Scenes: Tools That Power Social Media Sentiment Analysis

To build this social media sentiment analysis project, you’ll use a solid mix of Python libraries focused on natural language processing, machine learning, and data visualization:

Tool / Library	Purpose
Python	Core language for scripting and automation
Google Colab	Cloud-based platform to run notebooks without setup
Pandas	Loads, cleans, and processes text datasets efficiently
NumPy	Supports numerical operations during preprocessing and modeling
Matplotlib / Seaborn	Visualizes sentiment distributions, word frequencies, and trends
Scikit-learn	Trains and evaluates models like Naïve Bayes and SVM
NLTK / spaCy	Performs tokenization, stopword removal, and lemmatization
VADER	Quickly classifies sentiment using a rule-based lexicon

Also Read: Top 25 NLP Libraries for Python for Effective Text Analysis

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree18 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

How Long Will It Take and What Can You Expect?

You can complete this social media sentiment analysis project in 4 to 5 hours. It’s ideal for beginners who have some hands-on experience with Python and want to dive into real-world natural language processing tasks.

Smart Insights: Techniques That Power Social Media Sentiment Analysis

To build an effective sentiment analysis model for social media, you’ll apply essential techniques that help convert raw text into meaningful insights:

Text Cleaning & Preprocessing: Remove noise like URLs, emojis, stopwords, and special characters to clean up user posts.
TF-IDF Vectorization: Transform text data into numerical features that machine learning models can understand.
Machine Learning Models (Naïve Bayes, SVM): Train multiple models to classify posts as positive, negative, or neutral and compare their performance.

Also Read: Gaussian Naive Bayes: Understanding the Algorithm and Its Classifier Applications

How to Build a Social Media Sentiment Analysis Model

Let’s build this project from scratch with clear, step-by-step guidance:

Load the Social Media Dataset
Clean and Preprocess the Data
Feature Extraction with TF-IDF
Define Features and Target
Split Data into Train and Test Sets
Train Sentiment Classifiers
Evaluate Model Accuracy

Without any further delay, let’s get started!

Step 1: Download the Dataset

Download the dataset from Kaggle, extract the ZIP file, and use the downloaded dataset file for the project.

Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.

Step 2: Upload and Read the Dataset in Google Colab

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

Once uploaded, use the following Python code to read and check the data and import the required libraries:

# Install necessary libraries
!pip install pandas scikit-learn nltk spacy vaderSentiment


# Import libraries
import pandas as pd
import nltk
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report, accuracy_score
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Load dataset
df = pd.read_csv('social.csv')

# Basic overview
print(df.head())

# Check sentiment distribution
print(df['Sentiment Label'].value_counts())

Output:

Post ID \

0 aa391375-7355-44b7-bcbf-97fb4e5a2ba3
1 1c9ec98d-437a-48d9-9cba-bd5ad853c59a
2 170e5b5b-1d9a-4d02-a957-93c4dbb18908
3 aec53496-60ee-4a06-8821-093a04dc8770
4 4eacddb7-990d-4056-8784-7e1d5c4d1404

Post Content Sentiment Label \

0 Word who nor center everything better politica... Neutral
1 Begin administration population good president... Positive
2 Thousand total sign. Agree product relationshi... Positive
3 Individual from news third. Oil forget them di... Neutral
4 Time adult letter see reduce. Attention sudden... Negative

Number of Likes Number of Shares Number of Comments User Follower Count \

0 157 243 64 4921
1 166 49 121 612
2 185 224 179 9441
3 851 369 39 6251
4 709 356 52 1285

Post Date and Time Post Type Language

0 2024-01-10 00:14:21 video fr
1 2024-02-03 00:20:11 image es
2 2024-07-25 14:20:23 video de
3 2024-02-20 09:15:09 text de
4 2024-03-01 04:17:35 image de

Sentiment Label

Neutral 682
Negative 675
Positive 643
Name: count, dtype: int64

Step 3: Text Cleaning and Lemmatization for Social Media Posts

To prepare social media posts for sentiment analysis, we clean the text by removing links, mentions, hashtags, special characters, and stopwords. We also apply lemmatization using spaCy to reduce words to their base forms.

Here is the code for this step:

import nltk
import spacy
import re

from nltk.corpus import stopwords
nltk.download('stopwords')

# Load stopwords and spaCy model
stop_words = set(stopwords.words('english'))
nlp = spacy.load('en_core_web_sm')

# Text cleaning function
def clean_text(text):
    # Remove URLs, mentions, hashtags, non-alphabetic characters
    text = re.sub(r"http\S+|@\w+|#\w+|[^A-Za-z\s]", '', text.lower())
    doc = nlp(text)
    # Lemmatize and remove stopwords
    return ' '.join([token.lemma_ for token in doc if token.text not in stop_words and token.is_alpha])

# Apply cleaning function
df['clean_text'] = df['Post Content'].astype(str).apply(clean_text)

Conclusion:

This step results in a new column, clean_text that contains cleaned and lemmatized versions of the original posts, ready for vectorization and modeling.

Also Read: Stemming & Lemmatization in Python: Which One To Use?

Step 4: Feature Extraction Using TF-IDF

To convert cleaned text into numerical features for machine learning models, we use TF-IDF (Term Frequency–Inverse Document Frequency). It helps our sentiment classifier focus on the most meaningful terms.

Here is the code for this step:

from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize TF-IDF Vectorizer with top 5000 features
tfidf = TfidfVectorizer(max_features=5000)

# Transform the cleaned text into TF-IDF vectors
X = tfidf.fit_transform(df['clean_text'])

# Define the target variable
y = df['Sentiment Label']

This step transforms each post into a feature vector based on the most significant 5000 terms, preparing the data for model training.

Also Read: Text Summarization in NLP: Key Concepts, Techniques, and Implementation

Step 5: Splitting Data into Training and Testing Sets

To evaluate how well our sentiment analysis model performs, we split the dataset into a training set (used to train the model) and a testing set (used to evaluate it). We use stratified sampling to maintain the proportion of sentiment labels in both sets.

Here is the code for this step:

from sklearn.model_selection import train_test_split

# Split the data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

print("Training set size:", len(X_train))
print("Testing set size:", len(X_test))

Output:

Training set size: 1600
Testing set size: 400

Step 6: Training Sentiment Classifiers (Naïve Bayes & SVM)

Now that the data is ready, we’ll train two popular machine learning classifiers to predict sentiment: Naïve Bayes and Support Vector Machine (SVM). After training, we’ll evaluate both using classification metrics.

Here is the code for this step:

from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report

# Naïve Bayes
nb = MultinomialNB()
nb.fit(X_train, y_train)
y_pred_nb = nb.predict(X_test)

# SVM
svm = LinearSVC()
svm.fit(X_train, y_train)
y_pred_svm = svm.predict(X_test)

# Evaluation
print("NB Results:\n", classification_report(y_test, y_pred_nb))
print("SVM Results:\n", classification_report(y_test, y_pred_svm))

Output:

NB Results:

precision recall f1-score support

Negative 0.39 0.40 0.39 135
Neutral 0.36 0.43 0.39 136
Positive 0.30 0.23 0.26 129

accuracy 0.35 400
macro avg 0.35 0.35 0.35 400
weighted avg 0.35 0.35 0.35 400

SVM Results:

precision recall f1-score support

Negative 0.37 0.39 0.38 135
Neutral 0.34 0.35 0.34 136
Positive 0.34 0.32 0.33 129

accuracy 0.35 400
macro avg 0.35 0.35 0.35 400
weighted avg 0.35 0.35 0.35 400

Both models are now trained and evaluated. The classification report includes precision, recall, F1-score, and accuracy

Step 7: Visualizing Sentiment Distribution

Before diving deeper, it's useful to understand the balance of sentiment classes in the dataset. Here's a quick plot showing the distribution of sentiment labels.

Here is the code:

import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(data=df, x='Sentiment Label')
plt.title("Sentiment Distribution")
plt.show()

Output:

This plot helps you check whether the dataset is balanced or skewed toward certain sentiments, which can affect model performance.

Step 8: Evaluating Model with a Confusion Matrix

To better understand how well the SVM classifier performed, you can visualize its predictions using a confusion matrix.

It shows the number of correct and incorrect classifications for each sentiment class.

Here is the Code for this step:

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(y_test, y_pred_svm, labels=svm.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=svm.classes_)
disp.plot(cmap='Blues')
plt.title("SVM Confusion Matrix")
plt.show()

Output:

The SVM model shows moderate performance but often confuses similar sentiments:

Correctly predicts: 52 Negative, 47 Neutral, 41 Positive
Misclassifies many Neutral and Positive posts
Needs improvement in distinguishing close sentiment classes

To improve this and enhance your skills further in sentiment analysis, you can :

Use more training data
Try deep learning models like LSTM
Add sentiment-rich features like emojis, hashtags, or sentiment lexicons

This analysis gives you a clear direction for enhancing your model’s accuracy.

Also Read: Evaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Final Conclusion

In this project, you built a complete social media sentiment analysis model using text preprocessing, TF-IDF, and classifiers like Naïve Bayes and SVM. The SVM model performed slightly better, though neutral sentiments were often misclassified. This project gave you practical experience in NLP and classification that you can now build on.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Colab Link:
https://colab.research.google.com/drive/1NJ9H956op_L6nyLVuEzSA44yP1uMAiiD?usp=sharing

Frequently Asked Questions (FAQs)

1. What is social media sentiment analysis?

Social media sentiment analysis is the process of analyzing posts, tweets, and comments to detect opinions such as positive, negative, or neutral about a topic or brand.

2. How does social media sentiment analysis work?

It uses natural language processing (NLP) and machine learning to clean text, extract features, and classify sentiments based on patterns in the data.

3. How to do social media sentiment analysis?

Start by collecting social media data, cleaning and pre-processing it, applying sentiment scoring or training classifiers like Naïve Bayes or SVM, and then evaluating the results.

4. How accurate is the sentiment analysis using SVM and Naïve Bayes in this project?

In this project, SVM provided slightly better performance than Naïve Bayes. However, both models showed room for improvement, especially in distinguishing neutral sentiments.

5. Can this sentiment analysis model be applied to real-time social media data?

Yes, with slight modifications such as integrating the Twitter API or other social media streams, the trained model can analyze sentiments from real-time user posts.

Rohit Sharma

844 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources