A web scraper that exports your entire WhatsApp chat history.

Eddy Harrington

Last update: Jan 6, 2023

Related tags

Overview

WhatSoup 🍲

A web scraper that exports your entire WhatsApp chat history.

Overview
Demo
Prerequisites
Instructions
Frequently Asked Questions

Overview

Problem

Exports are limited up to a maximum of 40,000 messages
Exports skip the text portion of media-messages by replacing the entire message with instead of for example My favorite selfie of us 😻🐶🤳
Exports are limited to a .txt file format

Solution

WhatSoup solves these problems by loading the entire chat history in a browser, scraping the chat messages (only text, no media), and exporting it to .txt, .csv, or .html file formats.

Example output:

WhatsApp Chat with Bob Ross.txt

02/14/2021, 02:04 PM - Eddy Harrington: Hey Bob 👋 Let's move to Signal!
02/14/2021, 02:05 PM - Bob Ross: You can do anything you want. This is your world.
02/15/2021, 08:30 AM - Eddy Harrington: How about we use WhatSoup 🍲 to backup our cherished chats?
02/15/2021, 08:30 AM - Bob Ross: However you think it should be, that’s exactly how it should be.
02/15/2021, 08:31 AM - Eddy Harrington: You're the best, Bob ❤
02/19/2021, 11:24 AM - Bob Ross:  My latest happy 🌲 painting for you.

Demo

Prerequisites

You have a WhatsApp account
You have Chrome browser installed
You have some familiarity with setting up and running Python scripts
Your terminal supports unicode (UTF-8) characters (for chat emoji's)

Instructions

Make sure your WhatsApp chat settings are set to English language. This needs to be done on your phone (instructions here). You can change it back afterwards, but for now the script relies on certain HTML elements/attributes that contain English characters/words.

Clone the repo:

git clone https://github.com/eddyharrington/WhatSoup.git

Create a virtual environment:

# Windows
python -m venv env

# Linux & Mac
python3 -m venv env

Activate the virtual environment:

# Windows
env/Scripts/activate

# Linux & Mac
source env/bin/activate

Install the dependencies:

# Windows
pip install -r requirements.txt

# Linux & Mac
python3 -m pip install -r requirements.txt

Setup your environment

Download ChromeDriver and extract it to a local folder (such as the env folder)
Get your Chrome browser Profile Path by opening Chrome and entering chrome://version into the URL bar

Create an .env file with an entry for DRIVER_PATH and CHROME_PROFILE that specify the directory paths for your ChromeDriver and your Chrome Profile from above steps:

# Windows
DRIVER_PATH = 'C:\path-to-your-driver\chromedriver.exe'
CHROME_PROFILE = 'C:\Users\your-username\AppData\Local\Google\Chrome\User Data'

# Linux & Mac
DRIVER_PATH = '/Users/your-username/path-to-your-driver/chromedriver'
CHROME_PROFILE = '/Users/your-username/Library/Application Support/Google/Chrome/Default'

Run the script
```
# Windows
python whatsoup.py

# Linux & Mac
python3 whatsoup.py
```
Note for Mac users: you may get blocked when trying to run the script the first time with a message about chromedriver not being from an identified developer. This is normal. Follow these instructions to grant chromedriver an exception, then re-run the script.

Frequently Asked Questions

Does it download pictures / media?

No.

How large of chats can I load/export?

The most demanding part of the process is loading the entire chat in the browser, in which performance heavily depends on how much memory your computer has and how well Chrome handles the large DOM load. For reference, my largest chat (~50k messages) uses about 10GB of RAM. If you load more than the current record let me know and add yourself to the leader board.

WhatSoup Largest Chat Leader Board

#	Name	Date	Message Count	Time
🥇	Eddy	2021-02-28	47,550	28139 sec / 7.8 hrs
🥈	?	?	?	?
🥉	?	?	?	?

How long does it take to load/export?

Depends on the chat size and how performant your computer is, however below is a ballpark range to expect. For large chats, I recommend turning your PC's sleep/power settings to OFF and running the script in the evening or before bed so it loads over night.

# of msgs in chat history	Load time
500	1 min
5,000	12 min
10,000	35 min
25,000	3.5 hrs
50,000	8 hrs

Why is it so slow?!

Basically, browsers become easily bottlenecked when loading massive amounts of rich data in WhatsApp, which is a WebSocket application and is constantly sending/receiving information and changing the HTML/DOM.

I'm open to ideas but most of the things I tried didn't help performance:

Chrome vs Firefox ❌
Headless browsing ❌
Disabling images ❌
Removing elements from DOM ❌
Changing 'experimental' browser settings to allocate more memory ❌

Can I...

Use Firefox instead of Chrome? Yes, not out of the box though. There are a few Selenium differences and nuances to get it working, which I can share if there's interest. TODO.
Use headless? Yes, but I only got this to work with Firefox and not Chrome.
Use WhatSoup to scrape a local WhatsApp HTML file? Yes, you'd just need to bypass a few functions from main() and load the HTML file into Selenium's driver, then run the scraping/exporting functions like the below. If there's enough interest I can look into adding this to WhatSoup myself. TODO.
```
# Load and scrape data from local HTML file
def local_scrape(driver):
    driver.get('C:\your-WhatSoup-dir\source.html')
    scraped = scrape_chat(driver)
    scrape_is_exported("source", scraped)
```
Contribute to WhatSoup? Please do!

Comments

Unable to locate element: {"method":"css selector","selector":"._3Tw1q"}

hello, i having a problem running it on Win

C:\Windows\system32>python C:\Users\Kanna\WhatSoup\whatsoup.py [9512:4340:0227/163848.421:ERROR:upgrade_util_win.cc(73)] IProcessLauncher::LaunchCmdElevated failed; hr = 80004002 [9512:2152:0227/163848.451:ERROR:login_database.cc(654)] Password store database is too new, kCurrentVersionNumber=28, GetCompatibleVersionNumber=29 [9512:2152:0227/163848.451:ERROR:password_store_default.cc(39)] Could not create/open login database. DevTools listening on ws://127.0.0.1:55297/devtools/browser/6e5e80ed-295c-4c29-85e4-45131568fd88 Success! WhatsApp finished loading and is ready. Traceback (most recent call last): File "C:\Users\Kanna\WhatSoup\whatsoup.py", line 1008, in main() File "C:\Users\Kanna\WhatSoup\whatsoup.py", line 29, in main chats = get_chats(driver) File "C:\Users\Kanna\WhatSoup\whatsoup.py", line 183, in get_chats name_of_chat = selected_chat.find_element_by_class_name( File "C:\Users\Kanna\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webelement.py", line 398, in find_element_by_class_name return self.find_element(by=By.CLASS_NAME, value=name) File "C:\Users\Kanna\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webelement.py", line 658, in find_element return self._execute(Command.FIND_CHILD_ELEMENT, File "C:\Users\Kanna\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute return self._parent.execute(command, params) File "C:\Users\Kanna\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Users\Kanna\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"._3Tw1q"} (Session info: chrome=87.0.4280.66)

It opens chrome and opens WhatsApp web, but it does nothing to the page itself
bug

opened by kannadivinorum 4
Start script with an input argument to scrape only desired chat without loading up all users
Hi, I was wondering if I could directly load a chat for desired user to scrape when I already know the name of person/group. I have a lot of old chats/groups etc and the scripts breakdown loading up the contacts mostly with exceptions

raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"span"} (Session info: chrome=90.0.4430.212)
opened by ahrizvi 1
anyone that has a good fork?

Anyone that forked the project was able to solve the problems? My error says "executable_path has been deprecated, please pass in a Service object".

opened by joshiors 1
TypeError: argument of type 'NoneType' is not iterable

Although i have followed all the steps mentioned but still i am getting this error.

File "whatsoup.py", line 1104, in main() File "whatsoup.py", line 21, in main driver = setup_selenium() File "whatsoup.py", line 90, in setup_selenium executable_path=DRIVER_PATH, options=options) File "C:\Users\pk199\Desktop\final-project\Other\WhatsApp-Scrape\WhatSoup\env\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in init self.service.start() File "C:\Users\pk199\Desktop\final-project\Other\WhatsApp-Scrape\WhatSoup\env\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start stdin=PIPE) File "C:\Users\pk199\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 756, in init restore_signals, start_new_session) File "C:\Users\pk199\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 1100, in _execute_child args = list2cmdline(args) File "C:\Users\pk199\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 511, in list2cmdline
needquote = (" " in arg) or ("\t" in arg) or not arg TypeError: argument of type 'NoneType' is not iterable.

opened by Purushottam-BCA 0
Not scraping the text

Hey there only one problem is that its doing all the thing but when i select csv or any format it creates the file but when i open the file i does not have any content please help me

opened by amitvyas17 0

Message: no such element: Unable to locate element: {"method":"css selector","selector":"span"}

I ran into this issue on Windows as well as osx
My chrome version is 89.0.4389.82
Python version : Python 3.8.2
Here is the trace:

❯ python3 whatsoup.py
Success! WhatsApp finished loading and is ready.
Traceback (most recent call last):
  File "whatsoup.py", line 1099, in <module>
    main()
  File "whatsoup.py", line 30, in main
    chats = get_chats(driver)
  File "whatsoup.py", line 212, in get_chats
    last_chat_msg = last_chat_msg_element.find_element_by_tag_name(
  File "/Users/xxx/opt/WhatSoup/env/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 305, in find_element_by_tag_name
    return self.find_element(by=By.TAG_NAME, value=name)
  File "/Users/xxx/opt/WhatSoup/env/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 658, in find_element
    return self._execute(Command.FIND_CHILD_ELEMENT,
  File "/Users/xxx/opt/WhatSoup/env/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
    return self._parent.execute(command, params)
  File "/Users/xxx/opt/WhatSoup/env/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/Users/xxx/opt/WhatSoup/env/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"span"}
  (Session info: chrome=89.0.4389.82)

The script opens chrome and starts going through messages and crashes randomly at different messages. Language is set to english

bug

opened by oddtazz 13

Language/locale differences from en-US will raise an exception at various points
Issue

Various exceptions are raised when WhatsApp settings are set to anything other than English because there are a few areas in WhatSoup that depend on English characters/words. The date/time formats for non-English settings are likely different as well and also need to be revised with a more flexible solution such as dateutil.

Temporary workaround

Set WhatsApp settings on the phone to use English as the language before running the script. It can be changed back after scraping/exporting a chat.

Issue details

WhatSoup areas that depend on English language/locale:

Identifying 'Search results' element after searching for a specific chat

Loading all messages in a selected chat, has an xpath containing 'Message list'

Finding sender when a message does not contain text, has a condition for 'Voice message'

Determining if vCard/VCF media is in a message, has conditions for 'Message' and 'Add to a group'

Date/time string formatting all expects in the format of MM/DD/YYYY HH:MM AM/PM but there are variations such as YYYY-MM-DD, A.M. / P.M., etc.

Identifying search results

# Look for the unique class that holds 'Search results.' WebDriverWait(driver, 5).until(expected_conditions.presence_of_element_located( (By.XPATH, "//*[@id='pane-side']/div[1]/div/div[contains(@aria-label,'Search results.')]")))

Loading all messages

# Set focus to chat window (xpath == div element w/ aria-label set to 'Message list. Press right arrow key...') message_list_element = driver.find_element_by_xpath( "//*[@id='main']/div[3]/div/div/div[contains(@aria-label,'Message list')]")

Finding sender when a message does not contain text

# Last char in aria-label is always colon after the senders name if span.get('aria-label') != 'Voice message': return span.get('aria-label')[:-1]

Determining if vCard/VCF media is in a message

# Check if 'Message' is in the title (full title would be for example 'Message Bob Ross') if 'Message' in button.get('title'): # Next sibling should always be the 'Add to a group' button if button.nextSibling: if button.nextSibling.get('title') == 'Add to a group': return True
bug
opened by eddyharrington 1

Owner

Eddy Harrington

GitHub

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Shopee Scraper A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil. The project was crea

5 Nov 29, 2022

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

WebScraping Web scraping Pyton program that scrapes Job website for python devel

2 Jul 22, 2022

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

Introduction This is a project I built with the sole intent to learn more about

1 Jan 14, 2022

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

AutoScraper: A Smart, Automatic, Fast and Lightweight Web Scraper for Python This project is made for automatic web scraping to make scraping easy. It

4.8k Jan 4, 2023

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

15 May 17, 2022

A simple python web scraper.

Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

11 May 6, 2022

Web scraper for Zillow

Zillow-Scraper Instructions All terminal commands are highlighted. Make sure you first have python 3 installed. You can check this by running "python

1 Nov 23, 2021

A Python web scraper to scrape latest posts from official Coinbase's Blog.

Coinbase Blog Scraper A Python web scraper to scrape latest posts from official Coinbase's Blog. IDEA It scrapes up latest blog posts from https://blo

3 Feb 18, 2022

A web Scraper for CSrankings.com that scrapes University and Faculty list for a particular country

A web Scraper for CSrankings.com that scrapes University and Faculty list for a particular country To run the file: Open terminal

2 Jun 6, 2022

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

TDTV2-Direct Version 1.00.1 • A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com :) How to Works?? install all dependancies v

1 Nov 28, 2021

Web and PDF Scraper Refactoring

Web and PDF Scraper Refactoring This repository contains the example code of the Web and PDF scraper code roast. Here are the links to the videos: Par

18 Dec 31, 2022

A web scraper for nomadlist.com, made to avoid website restrictions.

Gypsylist gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions. nomadlist.com is a website with a lot of information fo

5 Nov 24, 2022

This is a web scraper, using Python framework Scrapy, built to extract data from the Deals of the Day section on Mercado Livre website.

Deals of the Day This is a web scraper, using the Python framework Scrapy, built to extract data such as price and product name from the Deals of the

1 Jan 12, 2022

Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

Video Games Web Scraper Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages. This

1 Jan 12, 2022

Basic-html-scraper - A complete how to of web scraping with Python for beginners

basic-html-scraper Code from YT Video This video includes a complete how to of w

12 Oct 22, 2022

OSTA web scraper, for checking the status of school buses in Ottawa

OSTA-La-Vista OSTA web scraper, for checking the status of school buses in Ottawa. Getting Started Using a Raspberry Pi, download Python 3, and option

1 Jan 28, 2022

Web scraper build using python.

Web Scraper This project is made in pyhthon. It took some info. from website list then add them into data.json file. The dependencies used are: reques

2 Jul 22, 2022

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc).

6 Aug 26, 2022

A universal package of scraper scripts for humans

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.

299 Dec 15, 2022

A web scraper that exports your entire WhatsApp chat history.

Related tags

Overview

WhatSoup 🍲

Table of Contents

Overview

Problem

Solution

Demo

Prerequisites

Instructions

Frequently Asked Questions

Does it download pictures / media?

How large of chats can I load/export?

How long does it take to load/export?

Why is it so slow?!

Can I...

Comments

Unable to locate element: {"method":"css selector","selector":"._3Tw1q"}

Start script with an input argument to scrape only desired chat without loading up all users

anyone that has a good fork?

TypeError: argument of type 'NoneType' is not iterable

Not scraping the text

Message: no such element: Unable to locate element: {"method":"css selector","selector":"span"}

Language/locale differences from en-US will raise an exception at various points

Issue

Temporary workaround

Issue details

Owner

Eddy Harrington

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

A simple python web scraper.

Web scraper for Zillow

A Python web scraper to scrape latest posts from official Coinbase's Blog.

A web Scraper for CSrankings.com that scrapes University and Faculty list for a particular country

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

Web and PDF Scraper Refactoring

A web scraper for nomadlist.com, made to avoid website restrictions.

This is a web scraper, using Python framework Scrapy, built to extract data from the Deals of the Day section on Mercado Livre website.

Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

Basic-html-scraper - A complete how to of web scraping with Python for beginners

OSTA web scraper, for checking the status of school buses in Ottawa

Web scraper build using python.

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

A universal package of scraper scripts for humans