HI,
I'm stuck in this problem, i configured a similar example to the startup project providing a detail page with 'pre_url': 'http://www.website.com'. I want it to scrape the listing every hour (using crontab) and add any new articles.
When i run the command for the first time (Article table empty), it populates the items correctly, however if i run the command again when new article added (with scrapy crawl article_spider -a id=2 -a do_action=yes
) with populated article it does scrap the page but doesn't add the new articles -
2016-08-27 10:33:45 [scrapy] ERROR: Error downloading <GET doublehttp://www.website.com/politique/318534.html>
Traceback (most recent call last):
File "/home/akram/eb-virt/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/akram/eb-virt/local/lib/python2.7/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/akram/eb-virt/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "/home/akram/eb-virt/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "/home/akram/eb-virt/local/lib/python2.7/site-packages/scrapy/core/downloader/handlers/__init__.py", line 64, in download_request
(scheme, self._notconfigured[scheme]))
NotSupported: Unsupported URL scheme 'doublehttp': no handler available for that scheme
2016-08-27 10:33:45 [scrapy] INFO: Closing spider (finished)
i searched for this "doublehttp" scheme error but couldn't find anything useful.
Versions i have -
Twisted==16.3.2
Scrapy==1.1.2
scrapy-djangoitem==1.1.1
django-dynamic-scraper==0.11.2
URL in DB (for an article) -
http://www.website.com/politique/318756.html
scraped URL without pre_url -
/politique/318756.html
Any hint ?
Thank you for your consideration and for this great project.