Facebook Group Scraping Using Beautiful Soup & Selenium

Overview

Notes

  • The scraper should only be used for educational purposes
  • Kindly refrain from scraping sensitive or private information
  • It is highly recommended to scrape public (and not private) groups
  • Ask for consent from the group adminstrator and/or group members before running any code
  • I am not responsible for any misuse of the code in any shape or form

Facebook Group Scraping Using Beautiful Soup & Selenium

Extract Facebook group posts that are related to a specific topic and write them to a .json file. This project was created in order to gather data needed to build a chatbot for a university's website.

Input

  • User's Credentials
  • Facebook Group URL
  • Number of Scrolls
    • Number of posts you want to collect
  • Directory of the Chromedriver
  • Optional: Specific topic to be searched

What the Scraper Does

  • Logs into Facebook using the User's Credentials
  • Enters the group specified by the User
  • Searches for the topic
  • Extracts all posts & their comments

Scraper Output

.json file that includes:

  • Each post
  • The comments replying to it

Format of file:

{ 
   "tag": "Topic 1",
   "patterns":  [ "Post text" ],
   "responses": [ "Comment 1", 
        "Comment 2",
        "Comment 3"  
    ]
}

Setup Requirements

  1. Make sure chrome is installed
  2. Install Chromedriver and place it in the same directory as the file
  3. Enter inputs required by the code
  4. Run the code

Updates

  • Scrape comments found in "view more comments"
  • Add a file for inputs only
  • Add comments to the code
  • Add an option to scrape the general group discussions and not specific topics
You might also like...
Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

Amazon web scraping using Scrapy Framework

Amazon-web-scraping-using-Scrapy-Framework Scrapy Scrapy is an application framework for crawling web sites and extracting structured data which can b

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Parallel web scraping The project is a training task for web scraping using python multithreading and a real-time-updated list of available proxy serv

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.
此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

Visual scraping for Scrapy

Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag

Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Comments
  • TimeoutException, etc

    TimeoutException, etc

    Hi - I keep getting the following error:

    /Applications/RESOURCES/Facebook-Group-Post-Scraper/Facebook Group Parser.py:27: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
      driver = webdriver.Chrome('/Applications/RESOURCES/Facebook-Group-Post-Scraper/chromedriver', options=chrome_options) #USER INPUT
    Traceback (most recent call last):
      File "/Applications/RESOURCES/Facebook-Group-Post-Scraper/Facebook Group Parser.py", line 41, in <module>
        login = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button[type = 'submit']"))).click()
      File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 89, in until
        raise TimeoutException(message, screen, stacktrace)
    selenium.common.exceptions.TimeoutException: Message: 
    Stacktrace:
    0   chromedriver                        0x00000001035bc159 chromedriver + 5120345
    1   chromedriver                        0x0000000103549b13 chromedriver + 4651795
    2   chromedriver                        0x0000000103139e68 chromedriver + 392808
    3   chromedriver                        0x000000010316f181 chromedriver + 610689
    4   chromedriver                        0x000000010316f341 chromedriver + 611137
    5   chromedriver                        0x00000001031a1a74 chromedriver + 817780
    6   chromedriver                        0x000000010318cb6d chromedriver + 732013
    7   chromedriver                        0x000000010319f637 chromedriver + 808503
    8   chromedriver                        0x000000010318ca33 chromedriver + 731699
    9   chromedriver                        0x00000001031625dd chromedriver + 558557
    10  chromedriver                        0x00000001031634f5 chromedriver + 562421
    11  chromedriver                        0x000000010357938d chromedriver + 4846477
    12  chromedriver                        0x000000010359321c chromedriver + 4952604
    13  chromedriver                        0x0000000103598a12 chromedriver + 4975122
    14  chromedriver                        0x0000000103593b4a chromedriver + 4954954
    15  chromedriver                        0x000000010356e5b0 chromedriver + 4801968
    16  chromedriver                        0x00000001035adf78 chromedriver + 5062520
    17  chromedriver                        0x00000001035ae0ff chromedriver + 5062911
    18  chromedriver                        0x00000001035c3545 chromedriver + 5150021
    19  libsystem_pthread.dylib             0x00007fff742fc2eb _pthread_body + 126
    20  libsystem_pthread.dylib             0x00007fff742ff249 _pthread_start + 66
    21  libsystem_pthread.dylib             0x00007fff742fb40d thread_start + 13
    

    My stats: Mac OS 10.14.6 Python 3.9.10 Chrome 99.0.4844.51 (Official Build) (x86_64) ChromeDriver 99.0.4844.51

    Thanks!

    opened by rfxwda 0
  • Feature request: general group discussions &

    Feature request: general group discussions & "view more comments"

    Thanks for making this! I was wondering if you have any estimate for these items in the todo list:

    • [ ] Scrape comments found in "view more comments"
    • [ ] Add an option to scrape the general group discussions and not specific topics
    opened by stianhoiland 1
Owner
Fatima Ghadieh
Third-year ECE student at AUB looking for undergraduate Machine Learning internships
Fatima Ghadieh
Web-Scraping using Selenium Master

Web-Scraping using Selenium What is the need of Selenium? Some websites don't like to be scrapped and in that case you need to disguise your webscrapi

Md Rashidul Islam 1 Oct 26, 2021
Here I provide the source code for doing web scraping using the python library, it is Selenium.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

M Khaidar 1 Nov 13, 2021
Web Scraping images using Selenium and Python

Web Scraping images using Selenium and Python A propos de ce document This is a markdown document about Web scraping images and videos using Selenium

Nafaa BOUGRAINE 3 Jul 1, 2022
A database scraper created with mechanical soup and sqlite

WebscrapingDatabases a database scraper created with mechanical soup and sqlite author: Mariya Sha Watch on YouTube: This repository was created to su

Mariya 30 Aug 8, 2022
Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Facebook Scraper Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key. (Currently working 2021) Setup Befo

Encore Shao 2 Dec 27, 2021
A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

WaGpScraper A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working

Muhammed Rizad 27 Dec 18, 2022
Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Web Scrapping Popular Youtube Tech Channels with Selenium Data Mining, Data Wrangling, and Exploratory Data Analysis About the Data Web scrapi

David Rusho 0 Aug 18, 2021
Google Maps crawler using Selenium

Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

Guilherme Latrova 46 Dec 16, 2022
A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

cybernews A package that provides you Latest Cyber/Hacker News from website using Web-Scraping. Latest Cyber/Hacker News Using Webscraping Developed b

Hitesh Rana 4 Jun 2, 2022
A simple django-rest-framework api using web scraping

Apicell You can use this api to search in google, bing, pypi and subscene and get results Method : POST Parameter : query Example import request url =

Hesam N 1 Dec 19, 2021