Facebook Group Scraping Using Beautiful Soup & Selenium

Fatima Ghadieh

Last update: Aug 12, 2022

Related tags

Overview

Notes

The scraper should only be used for educational purposes
Kindly refrain from scraping sensitive or private information
It is highly recommended to scrape public (and not private) groups
Ask for consent from the group adminstrator and/or group members before running any code
I am not responsible for any misuse of the code in any shape or form

Facebook Group Scraping Using Beautiful Soup & Selenium

Extract Facebook group posts that are related to a specific topic and write them to a .json file. This project was created in order to gather data needed to build a chatbot for a university's website.

Input

User's Credentials
Facebook Group URL
Number of Scrolls
- Number of posts you want to collect
Directory of the Chromedriver
Optional: Specific topic to be searched

What the Scraper Does

Logs into Facebook using the User's Credentials
Enters the group specified by the User
Searches for the topic
Extracts all posts & their comments

Scraper Output

.json file that includes:

Each post
The comments replying to it

Format of file:

{ 
   "tag": "Topic 1",
   "patterns":  [ "Post text" ],
   "responses": [ "Comment 1", 
        "Comment 2",
        "Comment 3"  
    ]
}

Setup Requirements

Make sure chrome is installed
Install Chromedriver and place it in the same directory as the file
Enter inputs required by the code
Run the code

Updates

Scrape comments found in "view more comments"
Add a file for inputs only
Add comments to the code
Add an option to scrape the general group discussions and not specific topics

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

1 Jan 4, 2022

Amazon web scraping using Scrapy Framework

Amazon-web-scraping-using-Scrapy-Framework Scrapy Scrapy is an application framework for crawling web sites and extracting structured data which can b

1 Jan 25, 2022

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Parallel web scraping The project is a training task for web scraping using python multithreading and a real-time-updated list of available proxy serv

1 Feb 10, 2022

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

5 Nov 19, 2021

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

0 Jan 22, 2022

Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

2.3k Jan 4, 2023

Visual scraping for Scrapy

Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag

8.7k Jan 5, 2023

Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

45.5k Jan 7, 2023

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Jan 8, 2023

Comments

TimeoutException, etc

Hi - I keep getting the following error:

/Applications/RESOURCES/Facebook-Group-Post-Scraper/Facebook Group Parser.py:27: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
  driver = webdriver.Chrome('/Applications/RESOURCES/Facebook-Group-Post-Scraper/chromedriver', options=chrome_options) #USER INPUT
Traceback (most recent call last):
  File "/Applications/RESOURCES/Facebook-Group-Post-Scraper/Facebook Group Parser.py", line 41, in <module>
    login = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button[type = 'submit']"))).click()
  File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 89, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 
Stacktrace:
0   chromedriver                        0x00000001035bc159 chromedriver + 5120345
1   chromedriver                        0x0000000103549b13 chromedriver + 4651795
2   chromedriver                        0x0000000103139e68 chromedriver + 392808
3   chromedriver                        0x000000010316f181 chromedriver + 610689
4   chromedriver                        0x000000010316f341 chromedriver + 611137
5   chromedriver                        0x00000001031a1a74 chromedriver + 817780
6   chromedriver                        0x000000010318cb6d chromedriver + 732013
7   chromedriver                        0x000000010319f637 chromedriver + 808503
8   chromedriver                        0x000000010318ca33 chromedriver + 731699
9   chromedriver                        0x00000001031625dd chromedriver + 558557
10  chromedriver                        0x00000001031634f5 chromedriver + 562421
11  chromedriver                        0x000000010357938d chromedriver + 4846477
12  chromedriver                        0x000000010359321c chromedriver + 4952604
13  chromedriver                        0x0000000103598a12 chromedriver + 4975122
14  chromedriver                        0x0000000103593b4a chromedriver + 4954954
15  chromedriver                        0x000000010356e5b0 chromedriver + 4801968
16  chromedriver                        0x00000001035adf78 chromedriver + 5062520
17  chromedriver                        0x00000001035ae0ff chromedriver + 5062911
18  chromedriver                        0x00000001035c3545 chromedriver + 5150021
19  libsystem_pthread.dylib             0x00007fff742fc2eb _pthread_body + 126
20  libsystem_pthread.dylib             0x00007fff742ff249 _pthread_start + 66
21  libsystem_pthread.dylib             0x00007fff742fb40d thread_start + 13

My stats: Mac OS 10.14.6 Python 3.9.10 Chrome 99.0.4844.51 (Official Build) (x86_64) ChromeDriver 99.0.4844.51

Thanks!

opened by rfxwda 0

Feature request: general group discussions & "view more comments"
Thanks for making this! I was wondering if you have any estimate for these items in the todo list:

[ ] Scrape comments found in "view more comments"

[ ] Add an option to scrape the general group discussions and not specific topics
opened by stianhoiland 1

Owner

Fatima Ghadieh

Third-year ECE student at AUB looking for undergraduate Machine Learning internships

GitHub

Web-Scraping using Selenium Master

Web-Scraping using Selenium What is the need of Selenium? Some websites don't like to be scrapped and in that case you need to disguise your webscrapi

1 Oct 26, 2021

Here I provide the source code for doing web scraping using the python library, it is Selenium.

1 Nov 13, 2021

Web Scraping images using Selenium and Python

Web Scraping images using Selenium and Python A propos de ce document This is a markdown document about Web scraping images and videos using Selenium

3 Jul 1, 2022

A database scraper created with mechanical soup and sqlite

WebscrapingDatabases a database scraper created with mechanical soup and sqlite author: Mariya Sha Watch on YouTube: This repository was created to su

30 Aug 8, 2022

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Facebook Scraper Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key. (Currently working 2021) Setup Befo

2 Dec 27, 2021

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

WaGpScraper A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working

27 Dec 18, 2022

Facebook Group Scraping Using Beautiful Soup & Selenium

Related tags

Overview

Notes

Facebook Group Scraping Using Beautiful Soup & Selenium

Input

What the Scraper Does

Scraper Output

Format of file:

Setup Requirements

Updates

You might also like...

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Amazon web scraping using Scrapy Framework

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

Web Scraping Framework

Visual scraping for Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Comments

TimeoutException, etc

Feature request: general group discussions & "view more comments"

Owner

Fatima Ghadieh

Web-Scraping using Selenium Master

Here I provide the source code for doing web scraping using the python library, it is Selenium.

Web Scraping images using Selenium and Python

A database scraper created with mechanical soup and sqlite

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Google Maps crawler using Selenium

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

A simple django-rest-framework api using web scraping