NLP For WhatsApp Chats

Project Overview

Natural Language Processing or NLP is a field of Artificial Intelligence which focuses on enabling the systems for understanding and processing the human languages. In this article, I will use NLP to analyze my WhatsApp Chats. For some privacy reasons, I will use Person 1, Person 2 and so on in my WhatsApp Chats.

Project Details

Get The Whatsapp Data for NLP

If you have never exported your whatsapp chats before, don’t worry it’s very easy. For NLP of WhatsApp chats, you need to extract the whatsapp chats from your smartphone. You just need to open any chat in your whatsapp then select the export chat option. The text file you will get as a return will look like this:

["[02/07/2017, 5:47:33 pm] Person_1: Hey there! This is the first message",

"[02/07/2017, 5:48:24 pm] Person_1: This is the second message",

"[02/07/2017, 5:48:44 pm] Person_1: Third…",

"[02/07/2017, 8:10:52 pm] Person_2: Hey Person_1! This is the fourth message",

"[02/07/2017, 8:14:11 pm] Person_2: Fifth …etc"]

I will use two different approaches for the NLP of WhatsApp Chats. First, by focusing on the fundamentals of NLP and the other is by using the datetime stamp at the starting of every conversation.

Formatting Whatsapp Chats for NLP

To analyze our whatsapp conversations, initially, our conversation needs to be formatted in the form of data. This involved a few basic steps in achieving the formation of data by creating a dictionary, constructed within two keys with each of the respective values with a list of the person tokenized conversations.

ppl=defaultdict(list) for line in content:

try:

person = line.split(':')[2][7:]

text = nltk.sent_tokenize(':'.join(line.split(':')[3:]))

ppl[person].extend(text) # If key exists (person), extend list with value (text),

# if not create a new key, with value added to list

except:

print(line) # in case reading a line fails, examine why pass

ppl = {'Person_1' : ['This is message 1', 'Another message',

'Hi Person_2', ... , 'My last tokenised message in the chat'] ,

'Person_2':['Hello Person_1!', 'How's it going?', 'Another messsage', ...]}

Classification of Dialogues

The classification of tokenized conversations will ne be achieved by training a Naive Bayes Classification model or the training set with some pre-categorized chat styles conversations:

Our trained model can be tested by using a test set or even by user input. Our model is trained in a way that can classify any tokenized sentence into different categories like Greetings, Statements, Emotions, questions, etc.

classifier.classify(extract_features('Hi there!'))

‘Greet’

Now let’s run the model on WhatsApp data for counting the occurrences of each category of the tokenized conversations:

ax = df.T.plot(kind='bar', figsize=(10, 7),

legend=True, fontsize=16, color=['y','g'])

ax.set_title("Frequency of Message Categories", fontsize= 18)

ax.set_xlabel("Message Category", fontsize=14)

ax.set_ylabel("Frequency", fontsize=14) #plt.savefig('plots/cat_message') # uncomment to save plt.show()

NLP for Whatsapp

NLP for WhatsApp Chats Emotions

We all use emojis, everyone, not only on WhatsApp but with any other chatting platform. Now let’s see what emojis are being used in most of the conversations.

Person_1's emojis:
 😏🕺🏼🍻😮🤤😭😏💁🏼😏👏🙏🐳🐋😏😱🙄😳☺😭🚀💫⭐✨💥🍕🍕😏😊😘🙄💭😭😭😭😭😏✅😱😏😭🙄😘😘😘😘😭😭😭😭😭😭🍸😘😘😅😘😭👏💪😭🙅♂🙆♂🙋♂💁♂😘🎉🎉🎉🎉🎉🎉🎉🎉🎉😊😘🙄😴😉🕺🏼😭😎😭🙄😘😘😘👏😩😭😭😭😭😭😭😭😭😭😭😭😭😭😭😭😭😭😭😭😭📞🎉😘😀😚😱👏🏏😏🚂🤓👏🙄🙌😘😘😏😭😭🙌😏😔😭😘🤰🏼😘🙄🙄😰🙋🏼♀😭🙄😍🤓👏😭😭😭😭😭😘🍕💩☹🙋🏼♀😘😴🚲😘😘😘😭☹😗😙😚😚🤔🤝🍻🎂✈😘👌😰😘🔺🔥😩😘💨😚😱😢😭😭😭😭😭😭😭😭😭😭😗🤔🤔🤔🤔🤔🤔🤔👀👏😇😗😚😘🙄☹😘😩😚😇⚡💥🔥☹😭😩😭😰😱😅😅😍😞👏👏👏👏👏👏😘😘😊😘😘😍😘🙄😏😘😘🙄😘👀😘😘👀😘😘😘🥕😘😘😘😘😘😘😘😘😭😘😘🖕🏻😘🌇😘😘😘🙄😪🤧😘🥚😘😘😘😘😘😘😘😘😘😘😱😘😭😭😘🆘❌‼⭕♨🚫⛔🚷🖍📌📍✂📕📮🔻☎⏰🚨🚒🚗🥊🏓🍷🌶🍅🍎☄🌹🎒👠⛑😎😘😘😘😘😙👀🙄😭😭😭😭😭😭😘😘😘🥚😘🙄🙄😘 

Most common: [('😘', 77), ('😭', 68), ('🙄', 16), ('👏', 13), ('😏', 11), ('🎉', 10), ('🤔', 8), ('🏼', 6), ('😱', 6), ('😚', 6)]


Person_2's emojis:
 😁🙂🤓😅😀👍😂😬👻😁😂✌😴😬😬🙄🎉✌😂😪😒😬😐😬😁😬😁😏🤢😁😒😁😏😘😒😅😂💪👊😬😏💁♂😴😬😅😏😆🐬🙁😬🐬😁😁✌😁😁👊👮😕✌😁😁😐✌😱😩😬✌✌😂😘💇♂😁😁😁😅🙂😬🙁😁😁😕😴😁😏😁😘😅😴🙂🎉🎉🎉😁🚀🚀🚀😁😱✌🍕🍕😏👍😂😁😑😘🙄😁😘😬😂😁🎉🎉🎉✌☺😑😁😬🙂😱😂✌☺😁👊😁👊👍😏💁🏼😅😁😁😁😕✌🤓😂😘😁😁✌✌😘🙁😘😁🎉✌😘😘😘😘😅😁😁😁😁😂🙁😏😔✌😘😁😐😁✌🙂👍😘😬😁✌😂🙋🏼😎😁🤓💩😂😘😐😏✌🙂✌😘✌😁🤔✌🏋🏼♀😬🙂😁👊😁✌😁😁😏🤜🤛☹⚡😬🎯💪😁☹😞👋🙂😘😴😁😁🎉😁✌🙂😘😬✌👍😁💃👍👍👍👍😢☹🙁🙁👋😏😬😁✌😘🙁👍🙌🤓😏🎉💁♂😁😑😁😁😁🎉😁☹😕😢😬✌😞😬✌😬👍😁😏😁👍👍👊😁😧😘😪😁🎉🎉🎉😕👍😁👉😁👊😏😁😁😂😂😂🤳👌😁👌🙋🏼♀👋😐😐😁🙁😕👊😁🤔🤗🤙👍😬🤔🎉🎅🏻👍😁😁😁🤚😘🤚👍👊🙁🙁🙁🙄😘🙋🏼♀🤣😘🎉😬🙁😖💁♂😂😒🎉😗👏🤔🤐🙄👊😘😉😘🙂☹💰😏🎉😑😬👍👍👎🙋♂💁♂😁😁🙂☹🤔🦄🦄😬😆😴😁😁😁😍🏄♀👀😁🏄♀👍😬👊😬🤔😁🙄👌👍😫☹🤗😩👀😁💰🤔👍😁😰😳😣😟😘👀🤗🙂😅👍🤔🙂😁😁😣🕺😮🙂☹☹😑🤘☹😬🍳😘😬😘🤘🙋♂🙁🍓😢😁😂😂😂😁😘🐑😚😚😚🤞😁🙄😁🙋♂😴😘👍😁👊😑😒👍😑😬👍👍👍😕☹😟💇♀👏🎉😏😁😚🤔👍👍😁😏👍😁😚😁🎉😬🙂😬😁🔥🤝☹🙌😏💁♂😁😁😁😁😁🙁😭🙂😬😘🙂😁😬👍☺🙁😂👀👌🙌😁💁🏼♀😁😬👍😕🙂😗😁😕🙁👀😁👏🎉😩😕🙁😊😴🤞😚😩😩😩😁😬👍👍😬😚😁😱👻👽😑😁😴🤒😁🙁👊🤓☹😁🤙😁👽👊😊🤙😁☹🙄😇🙂😁😩☹😚😏👍🙁👋😟😁☹😚🤔😧🙁☹🙃🙂👋🙂👍👍😁🤙👍💰🙂😢🤙💰😚👍🤔🤣🤣🤣🎉😢😏😬🤓👊💁♂😁😁😁👍🔥🤙😁👉😗😁⚡💆♀⚡👏😚😘🤔☹🤝😢😳😳😉👍☺👊☹⚡⚡⚡☹☹☹👍☹😚🔥🔥😢💰😁😬👊🤔👻🙌💁🏼♀😒😫👍👊😇🙂🤔🤙☹😪😉👍😁💪😭😁💩🤤😚☹☹👊🤙😚😘🙏🤥😁👍👍😚🤗😁🙄🙄😁👍😁😯😚👍🙄🙌🤔😁😘👍👊😱😏👍😘😁🎉😭😁😚😘😴👍😏🤔🤔😏🤢😘😭😭😚😬👍😘👊👌😘😁😁😚👋😁✋☝😭🤔👍😘🤙💁🏼♀😘😘👍👀👋😘😘😘😘😘😘😘😘🙁🙁👍😘😁😚👊👍😬👍👍🎉👍😋😘😘😘😘😘😘😘😘☹😘😁👍😁🤙👏👍😚😘😘😘😘😘😘👍💁🏼♀👍😘😏🤔👍👍👍😘👍😁😘👊👍👍👍👍☹👍👍👍👍😘👍😴🤙😘😘😘😘😘😘😘😘😕👊👍👍😁😘😚👆💁♀😴😘👊😥👊👍😅🙂👊🤙😘😘😘😘😘😘😘😘😲😘👍🤔😫🤣🍳😎😚😢😯💃👍🙄👍👍💇♂👊😚😚😘👍🙄😘😚😘😢🛎😚🙏😂😘😘😘👌👍🤷♂😂👍😕👍😘😘😘😘👏👊😅😉💤👍😁😚👍🤙🤓🤗😘😁💃😏😘😘😬💁♂😂☹😁👍😘 

Most common: [('😁', 138), ('😘', 103), ('👍', 91), ('😬', 42), ('👊', 29), ('☹', 29), ('😚', 28), ('✌', 27), ('😏', 25), ('🙂', 24)]

Sentiment Against Time

The plotting of sentiments against the datetime is not as easy as it looks. As there are many different sentiments on the same day, so the first step is to calculate the mean sentiment for each day and then grouping by datetime. So let’s see how we can do this:

rolling mean

Frequency of Chats

Now let’s have a look at the frequency of whatsapp chats which is not a part of NLP for Whatsapp but it is a part of time series analysis. We can use time series here to see the frequency of chats. First, need to create a colour pallete ordered by the total number of messages for each day.

View Live Project View Source Code

Project Information

Category: python d-s ml-ai-nlp

Completed: 2023

Technologies Used

Python
Machine Learning
NLP

Project Overview

Project Details

Get The Whatsapp Data for NLP

Formatting Whatsapp Chats for NLP

Classification of Dialogues

NLP for WhatsApp Chats Emotions

Sentiment Against Time

Frequency of Chats

Project Information

Technologies Used

Share This Project

Looking for AI/ML expertise?

Project Overview

Project Details

Get The Whatsapp Data for NLP

Formatting Whatsapp Chats for NLP

Classification of Dialogues

NLP for WhatsApp Chats Emotions

Sentiment Against Time

Frequency of Chats

Project Information

Technologies Used

Share This Project

Related Projects

Spam Detection with Machine Learning

Spam Detection

Count Objects in Image using Python

Count Objects in Image

Instagram Filters with Python

Instagram Filters with Python

Looking for AI/ML expertise?