Data Scraping for Glassdoor
This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of Service that explicitly prohibits web scraping.
Built With
- Python
- ChromeDriver
Getting Started
Download the SeleniumGlassdor.py file. Change the path of the chromedriver on your machine. Use your own file that contain the lists of the companies glassdoor url. The company url csv file is also attached here. The way to generate the file is also based on selenium, searching the 'glassdoor' + company name in google search engine, and extract the url from the first results. Per requests, I can also upload the file accordingly.
Prerequisites
Install the selenium before using it.
- selenium
pip install selenium
For the other sections
If you want to scape data from the other sections, such as jobs, salaries. You can use the following methods to first extract the url and then use the similar method to downlode the sections.
- reviewsUrl = browser.find_element_by_xpath("//a[@data-label='Reviews']").get_attribute('href')
- jobsUrl = browser.find_element_by_xpath("//a[@data-label='Jobs']").get_attribute('href')
- salariesUrl = browser.find_element_by_xpath("//a[@data-label='Salaries']").get_attribute('href')
- interviewsUrl = browser.find_element_by_xpath("//a[@data-label='Interviews']").get_attribute('href')
- benefitsUrl = browser.find_element_by_xpath("//a[@data-label='Benefits']").get_attribute('href')
- photosUrl = browser.find_element_by_xpath("//a[@data-label='Photos']").get_attribute('href')
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE.txt
for more information.
Contact
Houping - [email protected]