Web Crawler for "sina weibo"
A web crawler for recording posts in "sina weibo"
Introduction
This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.
Functions
Scripts currently available:
Name | Description |
---|---|
search.py |
Search for a word and specific time interval and record all posts, the search result. Parameters: (Edit these parameters at the head of the script.) search_string : The string to search for. All posts containing this string will be recorded, 50 pages at most.start_time : Only posts which are posted after this time will be recorded. (Accurate to hour level)end_time : Only posts which are posted before this time will be recorded. (Accurate to hour level)rest_time : The interval between two requests, where the unit is second.Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl . The start_time and end_time in filename are formatted as Unix timestamp (the unit is second). |
Installation
- Run
pip install -r requirements.txt
. - According to "Function" section, find the script you need.
- Edit parameters at the head of the script.
- Run the script with Python.