上次在Ubuntu 16.04下碰到类似问题,celery的连接已经解决(重启celery即可)。但爬取不到数据的问题依旧存在,后面我重装了系统Ubuntu 18.04, 项目还是使用1.7.2. 按指南配置好后:
在终端A中 运行celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1 显示信息如下:
[2018-07-13 09:48:01,247: INFO/MainProcess] Connected to redis://:**@127.0.0.1:6379/5
[2018-07-13 09:48:01,260: INFO/MainProcess] mingle: searching for neighbors
[2018-07-13 09:48:02,288: INFO/MainProcess] mingle: all alone
当我打开另一个终端B 输入"python3 login_first.py" 终端A中显示如下信息
[2018-07-21 14:43:23,672: INFO/MainProcess] Received task: tasks.login.login_task[3c1c3e8b-4850-4807-a463-f321c59c216d]
2018-07-21 14:43:30 - other - INFO - Login successful! The login account is [email protected]
[2018-07-21 14:43:30,016: INFO/ForkPoolWorker-1] Login successful! The login account is [email protected]
我认为登录是没有问题的,然后在终端A 我试另一种办法:
celery beat -A tasks.workers -l info
终端A中显示:
celery beat v4.1.1 (latentcall) is starting.
__ - ... __ - _
LocalTime -> 2018-07-21 15:29:08
Configuration ->
. broker -> redis://:**@127.0.0.1:6379/5
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]@%INFO
. maxinterval -> 5.00 minutes (300s)
在上面的信息中有 :logfile-> [stderr]@%INFO, 意味着登录失败了吗?
此时我在终端B中发现
[2018-07-21 15:47:45,777: INFO/MainProcess] Received task: tasks.user.excute_user_task[dbc668b5-1669-48e9-b868-95f773168039]
[2018-07-21 15:47:45,828: INFO/MainProcess] Received task: tasks.user.crawl_person_infos[9d5df559-6c79-407a-8abf-beb639ed08df]
2018-07-21 15:47:45 - crawler - INFO - the crawling url is http://weibo.com/p/1005051483330984/info?mod=pedit_more
[2018-07-21 15:47:45,843: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051483330984/info?mod=pedit_more
[2018-07-21 15:47:46,467: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[2018-07-21 15:47:47,647: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[2018-07-21 15:47:47,906: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
2018-07-21 15:48:05 - crawler - INFO - the crawling url is http://weibo.com/p/1003061483330984/info?mod=pedit_more
[2018-07-21 15:48:05,475: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1003061483330984/info?mod=pedit_more
[2018-07-21 15:48:06,076: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[2018-07-21 15:48:22,506: ERROR/ForkPoolWorker-1] db operation error,here are details(pymysql.err.DataError) (1406, "Data too long for column 'tags' at row 1") [SQL: 'INSERT INTO wbuser (uid, name, gender, birthday, location, description, register_time, verify_type, verify_info, follows_num, fans_num, wb_num, level, tags, work_info, contact_info, education_info, head_img) VALUES (%(uid)s, %(name)s, %(gender)s, %(birthday)s, %(location)s, %(description)s, %(register_time)s, %(verify_type)s, %(verify_info)s, %(follows_num)s, %(fans_num)s, %(wb_num)s, %(level)s, %(tags)s, %(work_info)s, %(contact_info)s, %(education_info)s, %(head_img)s)'] [parameters: {'uid': '1483330984', 'name': '侯宁', 'gender': 1, 'birthday': '', 'location': '北京', 'description': '人称"空军司令",财富苍生之醉观者。长篇小说《财富苍生-槐花蛇》作者,侯宁微店https://d.weidian.com/single/#/main', 'register_time': ' 2009-08-28 ', 'verify_type': 1, 'verify_info': '独立财经观察家,时评家、社会学者、职业投资人 微博签约自媒体', 'follows_num': 617, 'fans_num': 2843631, 'wb_num': 210982, 'level': '48', 'tags': '槐花蛇 ; 财富苍生 ; ... (240 characters truncated) ... ; 经济学家 ; 投资理财', 'work_info': '中国人民大学 (1991 - 1994) ... (226 characters truncated) ... 职位:社会学研究所 ', 'contact_info': '', 'education_info': '北京理工大学 (1984年) ', 'head_img': 'http://tva2.sinaimg.cn/crop.86.56.768.768.180/5869d5a8gw1f5ycui2b91j20qg0zkgt1.jpg'}]
[2018-07-21 15:48:22,506: WARNING/ForkPoolWorker-1] transaction rollbacks
[2018-07-21 15:48:22,507: INFO/ForkPoolWorker-1] has stored user 1483330984 info successfully
[2018-07-21 15:48:22,517: INFO/MainProcess] Received task: tasks.user.crawl_follower_fans[f609122b-9b92-464d-9de5-cf3794f5f7e2]
2018-07-21 15:48:22 - crawler - INFO - the crawling url is http://weibo.com/p/1005051483330984/follow?relate=fans&page=1#Pl_Official_HisRelation__60
[2018-07-21 15:48:22,522: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051483330984/follow?relate=fans&page=1#Pl_Official_HisRelation__60
[2018-07-21 15:48:24,338: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[2018-07-21 15:48:25,520: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[2018-07-21 15:48:25,775: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
2018-07-21 15:48:42 - crawler - INFO - the crawling url is http://weibo.com/p/1005051483330984/follow?page=1#Pl_Official_HisRelation__60
[2018-07-21 15:48:42,292: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051483330984/follow?page=1#Pl_Official_HisRelation__60
[2018-07-21 15:48:42,849: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[2018-07-21 15:48:44,019: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[2018-07-21 15:48:44,283: WARNING/ForkPoolWorker-1] /home/zcao/.local/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
可以看到确实爬取到了用户 侯宁 的界面,并且有显示 侯宁 信息。
但在 weibo_data table 表中并未发现数据,谢谢。
SQLAchemy版本已经改成了1.1.15.
Mysql 5.7.22
celelry 4.1.1
上次你建议抓包看响应,可否稍微具体一点? 前几天有个deadline。没能及时更新和你的讨论,不好意思。麻烦了。