The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

Overview

The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker.

Processed data ready for analysis is available at datadesk/california-coronavirus-data.

Scrapers

The scrapers are written using Python and Jupyter notebooks, scheduled and run via GitHub Actions and then archived using git.

module status maintainer
bed-surges C Ben Welsh
cases-deaths-demographics Ben Welsh
cases-deaths-tests C Sean Greene
demographics-age C Sean Greene
demographics-race-by-county C Rahul Mukherjee
demographics-race-statewide C Aida Ylanan
federal-prisons Iris Lee
homeless-impact Jennifer Lu
hopkins Ben Welsh
hospital-patients Ben Welsh
hospital-capacity Ben Welsh
hospital-locations Ben Welsh
ice-detainees Iris Lee
icu-capacity Sean Greene
local-adult-detention-facilities Iris Lee
local-juvenile-detention-facilities Iris Lee
places Et al.
probable-cases Ben Welsh
reopening-tiers Retired Ben Welsh
school-reopenings Retired Iris Lee
skilled-nursing-facilities Ben Welsh
skilled-nursing-totals Ben Welsh
state-prisons Iris Lee
vaccine-breakthrough-cases Vaccine breakthrough cases, hospitalizations and deaths Sean Greene
vaccine-cdc-state-totals Vaccine CDC state totals Ben Welsh
vaccine-doses-on-hand Sean Greene
vaccine-progress Sean Greene
vaccine-hpi Sean Greene
vaccine-demographics-by-county Vaccine demographics by county Sean Greene
vaccine-demographics-statewide Sean Greene
vaccine-shipped-delivered Sean Greene
variant-proportions-states Matt Stiles
variant-toplines-ca Matt Stiles
vaccine-zip-codes Vaccine ZIP codes Sean Greene, Matt Stiles

Installation

Clone the repository and install the Python dependencies.

pipenv install

Run all of the scraper commands.

make

make all

Run one of the scraper commands.

make -f vaccine-hpi/Makefile
Comments
  • Fix for Butte County places scraper

    Fix for Butte County places scraper

    The script in butte.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    cities = data["elements"]["content"]["content"]["entities"][
        "b26b9acd-b036-40bc-bbbe-68667dd338e4"
    ]["props"]["chartData"]["data"][0]
    ------------------
    
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /tmp/ipykernel_1778/755814286.py in <module>
    ----> 1 cities = data["elements"]["content"]["content"]["entities"][
          2     "b26b9acd-b036-40bc-bbbe-68667dd338e4"
          3 ]["props"]["chartData"]["data"][0]
    
    KeyError: 'b26b9acd-b036-40bc-bbbe-68667dd338e4'
    KeyError: 'b26b9acd-b036-40bc-bbbe-68667dd338e4'
    
    

    A branch has been created automatically at fix-butte-1650399683.706548. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 2
  • Fix for Sutter County places scraper

    Fix for Sutter County places scraper

    The script in sutter.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    try:
        assert not len(sutter_clean) < 3
    except AssertionError:
        raise AssertionError("Sutter County's scraper is missing rows")
    ------------------
    
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2827/778601857.py in <module>
          1 try:
    ----> 2     assert not len(sutter_clean) < 3
          3 except AssertionError:
    
    AssertionError: 
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2827/778601857.py in <module>
          2     assert not len(sutter_clean) < 3
          3 except AssertionError:
    ----> 4     raise AssertionError("Sutter County's scraper is missing rows")
    
    AssertionError: Sutter County's scraper is missing rows
    AssertionError: Sutter County's scraper is missing rows
    
    

    A branch has been created automatically at fix-sutter-1650068839.547007. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 2
  • Fix for Placer County places scraper

    Fix for Placer County places scraper

    The script in placer.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    page = requests.get(url)
    ------------------
    
    ---------------------------------------------------------------------------
    SSLCertVerificationError                  Traceback (most recent call last)
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
        698             # Make the request on the httplib connection object.
    --> 699             httplib_response = self._make_request(
        700                 conn,
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
        381         try:
    --> 382             self._validate_conn(conn)
        383         except (SocketTimeout, BaseSSLError) as e:
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
       1009         if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
    -> 1010             conn.connect()
       1011 
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connection.py in connect(self)
        468                 )
    --> 469             _match_hostname(cert, self.assert_hostname or server_hostname)
        470 
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connection.py in _match_hostname(cert, asserted_hostname)
        541     try:
    --> 542         match_hostname(cert, asserted_hostname)
        543     except CertificateError as e:
    
    /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/ssl.py in match_hostname(cert, hostname)
        415     if len(dnsnames) > 1:
    --> 416         raise CertificateError("hostname %r "
        417             "doesn't match either of %s"
    
    SSLCertVerificationError: ("hostname 'itwebservices.placer.ca.gov' doesn't match either of '*.vpn.placer.ca.gov', 'vpn.placer.ca.gov'",)
    
    During handling of the above exception, another exception occurred:
    
    MaxRetryError                             Traceback (most recent call last)
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
        438             if not chunked:
    --> 439                 resp = conn.urlopen(
        440                     method=request.method,
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
        754 
    --> 755             retries = retries.increment(
        756                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
        573         if new_retry.is_exhausted():
    --> 574             raise MaxRetryError(_pool, url, error or ResponseError(cause))
        575 
    
    MaxRetryError: HTTPSConnectionPool(host='itwebservices.placer.ca.gov', port=443): Max retries exceeded with url: /coviddashboard (Caused by SSLError(SSLCertVerificationError("hostname 'itwebservices.placer.ca.gov' doesn't match either of '*.vpn.placer.ca.gov', 'vpn.placer.ca.gov'")))
    
    During handling of the above exception, another exception occurred:
    
    SSLError                                  Traceback (most recent call last)
    /tmp/ipykernel_3458/699087206.py in <module>
    ----> 1 page = requests.get(url)
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/api.py in get(url, params, **kwargs)
         73     """
         74 
    ---> 75     return request('get', url, params=params, **kwargs)
         76 
         77 
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/api.py in request(method, url, **kwargs)
         59     # cases, and look like a memory leak in others.
         60     with sessions.Session() as session:
    ---> 61         return session.request(method=method, url=url, **kwargs)
         62 
         63 
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
        540         }
        541         send_kwargs.update(settings)
    --> 542         resp = self.send(prep, **send_kwargs)
        543 
        544         return resp
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/sessions.py in send(self, request, **kwargs)
        653 
        654         # Send the request
    --> 655         r = adapter.send(request, **kwargs)
        656 
        657         # Total elapsed time of the request (approximately)
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
        512             if isinstance(e.reason, _SSLError):
        513                 # This branch is for urllib3 v1.22 and later.
    --> 514                 raise SSLError(e, request=request)
        515 
        516             raise ConnectionError(e, request=request)
    
    SSLError: HTTPSConnectionPool(host='itwebservices.placer.ca.gov', port=443): Max retries exceeded with url: /coviddashboard (Caused by SSLError(SSLCertVerificationError("hostname 'itwebservices.placer.ca.gov' doesn't match either of '*.vpn.placer.ca.gov', 'vpn.placer.ca.gov'")))
    SSLError: HTTPSConnectionPool(host='itwebservices.placer.ca.gov', port=443): Max retries exceeded with url: /coviddashboard (Caused by SSLError(SSLCertVerificationError("hostname 'itwebservices.placer.ca.gov' doesn't match either of '*.vpn.placer.ca.gov', 'vpn.placer.ca.gov'")))
    
    

    A branch has been created automatically at fix-placer-1640649819.363141. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 2
  • Fix for El Dorado County places scraper

    Fix for El Dorado County places scraper

    The script in el-dorado.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    for item in data["features"]:
        timestamp = item["attributes"]["survey_date"]
        timestamp = datetime.fromtimestamp((timestamp / 1000))
        d = dict(
            county="El Dorado",
            area=item["attributes"]["Report_Are"],
            confirmed_cases=item["attributes"]["Case_Count"],
            county_date=timestamp,
        )
        dict_list.append(d)
    ------------------
    
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /tmp/ipykernel_2931/2936043572.py in <module>
    ----> 1 for item in data["features"]:
          2     timestamp = item["attributes"]["survey_date"]
          3     timestamp = datetime.fromtimestamp((timestamp / 1000))
          4     d = dict(
          5         county="El Dorado",
    
    KeyError: 'features'
    KeyError: 'features'
    
    

    A branch has been created automatically at fix-el-dorado-1639470181.021016. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 2
  • lxml error when scraping variants from CDPH

    lxml error when scraping variants from CDPH

    The "Variant totals in CA" workflow fails when running the variant-topline-ca notebook. It's throwing an error because it can't find the lxml tool, which is needed by Pandas read_html() function. lxml is in the Pipfile and the notebook runs fine locally without errors.

    Any ideas?

    Screen Shot 2021-04-17 at 3 50 56 PM

    opened by stiles 2
  • Fix for Stanislaus County places scraper

    Fix for Stanislaus County places scraper

    The script in stanislaus.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    df = df.filter(["area", "confirmed_cases"], axis=1).sort_values(
        by="area", ascending=True
    )
    ------------------
    
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /tmp/ipykernel_2690/1824696883.py in <module>
    ----> 1 df = df.filter(["area", "confirmed_cases"], axis=1).sort_values(
          2     by="area", ascending=True
          3 )
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
        309                     stacklevel=stacklevel,
        310                 )
    --> 311             return func(*args, **kwargs)
        312 
        313         return wrapper
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key)
       6257 
       6258             by = by[0]
    -> 6259             k = self._get_label_or_level_values(by, axis=axis)
       6260 
       6261             # need to rewrap column in Series to apply key function
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/pandas/core/generic.py in _get_label_or_level_values(self, key, axis)
       1777             values = self.axes[axis].get_level_values(key)._values
       1778         else:
    -> 1779             raise KeyError(key)
       1780 
       1781         # Check for duplicates
    
    KeyError: 'area'
    KeyError: 'area'
    
    

    A branch has been created automatically at fix-stanislaus-1655410913.964428. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 1
  • Fix for Stanislaus County places scraper

    Fix for Stanislaus County places scraper

    The script in stanislaus.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    df = df.filter(["area", "confirmed_cases"], axis=1).sort_values(
        by="area", ascending=True
    )
    ------------------
    
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /tmp/ipykernel_2876/1824696883.py in <module>
    ----> 1 df = df.filter(["area", "confirmed_cases"], axis=1).sort_values(
          2     by="area", ascending=True
          3 )
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
        309                     stacklevel=stacklevel,
        310                 )
    --> 311             return func(*args, **kwargs)
        312 
        313         return wrapper
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key)
       6257 
       6258             by = by[0]
    -> 6259             k = self._get_label_or_level_values(by, axis=axis)
       6260 
       6261             # need to rewrap column in Series to apply key function
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/pandas/core/generic.py in _get_label_or_level_values(self, key, axis)
       1777             values = self.axes[axis].get_level_values(key)._values
       1778         else:
    -> 1779             raise KeyError(key)
       1780 
       1781         # Check for duplicates
    
    KeyError: 'area'
    KeyError: 'area'
    
    

    A branch has been created automatically at fix-stanislaus-1655051190.186447. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 1
  • Fix for Placer County places scraper

    Fix for Placer County places scraper

    The script in placer.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    page = requests.get(url)
    ------------------
    
    ---------------------------------------------------------------------------
    SSLCertVerificationError                  Traceback (most recent call last)
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
        698             # Make the request on the httplib connection object.
    --> 699             httplib_response = self._make_request(
        700                 conn,
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
        381         try:
    --> 382             self._validate_conn(conn)
        383         except (SocketTimeout, BaseSSLError) as e:
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
       1009         if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
    -> 1010             conn.connect()
       1011 
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connection.py in connect(self)
        415 
    --> 416         self.sock = ssl_wrap_socket(
        417             sock=conn,
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/util/ssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
        448     if send_sni:
    --> 449         ssl_sock = _ssl_wrap_socket_impl(
        450             sock, context, tls_in_tls, server_hostname=server_hostname
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/util/ssl_.py in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname)
        492     if server_hostname:
    --> 493         return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
        494     else:
    
    /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
        499         # ctx._wrap_socket()
    --> 500         return self.sslsocket_class._create(
        501             sock=sock,
    
    /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
       1039                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
    -> 1040                     self.do_handshake()
       1041             except (OSError, ValueError):
    
    /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/ssl.py in do_handshake(self, block)
       1308                 self.settimeout(None)
    -> 1309             self._sslobj.do_handshake()
       1310         finally:
    
    SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)
    
    During handling of the above exception, another exception occurred:
    
    MaxRetryError                             Traceback (most recent call last)
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
        438             if not chunked:
    --> 439                 resp = conn.urlopen(
        440                     method=request.method,
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
        754 
    --> 755             retries = retries.increment(
        756                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
        573         if new_retry.is_exhausted():
    --> 574             raise MaxRetryError(_pool, url, error or ResponseError(cause))
        575 
    
    MaxRetryError: HTTPSConnectionPool(host='itwebservices.placer.ca.gov', port=443): Max retries exceeded with url: /coviddashboard/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))
    
    During handling of the above exception, another exception occurred:
    
    SSLError                                  Traceback (most recent call last)
    /tmp/ipykernel_2353/699087206.py in <module>
    ----> 1 page = requests.get(url)
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/api.py in get(url, params, **kwargs)
         73     """
         74 
    ---> 75     return request('get', url, params=params, **kwargs)
         76 
         77 
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/api.py in request(method, url, **kwargs)
         59     # cases, and look like a memory leak in others.
         60     with sessions.Session() as session:
    ---> 61         return session.request(method=method, url=url, **kwargs)
         62 
         63 
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
        540         }
        541         send_kwargs.update(settings)
    --> 542         resp = self.send(prep, **send_kwargs)
        543 
        544         return resp
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/sessions.py in send(self, request, **kwargs)
        653 
        654         # Send the request
    --> 655         r = adapter.send(request, **kwargs)
        656 
        657         # Total elapsed time of the request (approximately)
    
    ~/.local/share/virtualenvs/california-coronavirus-scrapers-dxuBXRsm/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
        512             if isinstance(e.reason, _SSLError):
        513                 # This branch is for urllib3 v1.22 and later.
    --> 514                 raise SSLError(e, request=request)
        515 
        516             raise ConnectionError(e, request=request)
    
    SSLError: HTTPSConnectionPool(host='itwebservices.placer.ca.gov', port=443): Max retries exceeded with url: /coviddashboard/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))
    SSLError: HTTPSConnectionPool(host='itwebservices.placer.ca.gov', port=443): Max retries exceeded with url: /coviddashboard/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))
    
    

    A branch has been created automatically at fix-placer-1652718374.584501. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 1
  • Fix for Sierra County places scraper

    Fix for Sierra County places scraper

    The script in sierra.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    for row in row_list:
        cell_list = row.find_all("td")
        d = dict(
            county="Sierra",
            area=safetxt(cell_list[0]),
            confirmed_cases=safenumber(cell_list[1]),
        )
        dict_list.append(d)
    ------------------
    
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    /tmp/ipykernel_2713/2902473984.py in <module>
          4         county="Sierra",
          5         area=safetxt(cell_list[0]),
    ----> 6         confirmed_cases=safenumber(cell_list[1]),
          7     )
          8     dict_list.append(d)
    
    IndexError: list index out of range
    IndexError: list index out of range
    
    

    A branch has been created automatically at fix-sierra-1650068837.552932. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 1
  • Fix for Santa Cruz County places scraper

    Fix for Santa Cruz County places scraper

    The script in santa-cruz.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    ds = data["results"][0]["result"]["data"]["dsr"]["DS"][0]
    ------------------
    
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /tmp/ipykernel_3676/906845693.py in <module>
    ----> 1 ds = data["results"][0]["result"]["data"]["dsr"]["DS"][0]
    
    KeyError: 'results'
    KeyError: 'results'
    
    

    A branch has been created automatically at fix-santa-cruz-1645576330.696676. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 1
  • Fix for San Diego County places scraper

    Fix for San Diego County places scraper

    The script in san-diego.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    cols = list(data["features"][0]["attributes"].keys())
    ------------------
    
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    /tmp/ipykernel_3436/100029346.py in <module>
    ----> 1 cols = list(data["features"][0]["attributes"].keys())
    
    IndexError: list index out of range
    IndexError: list index out of range
    
    

    A branch has been created automatically at fix-san-diego-1644424223.474247. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 1
  • Fix for Los Angeles County places scraper

    Fix for Los Angeles County places scraper

    The script in los-angeles.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    try:
        assert not len(df) < 342
    except AssertionError:
        raise AssertionError("L.A. County's scraper is missing rows")
    ------------------
    
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2232/2999864733.py in <module>
          1 try:
    ----> 2     assert not len(df) < 342
          3 except AssertionError:
    
    AssertionError: 
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2232/2999864733.py in <module>
          2     assert not len(df) < 342
          3 except AssertionError:
    ----> 4     raise AssertionError("L.A. County's scraper is missing rows")
    
    AssertionError: L.A. County's scraper is missing rows
    AssertionError: L.A. County's scraper is missing rows
    
    

    A branch has been created automatically at fix-los-angeles-1673238113.705687. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 0
  • Fix for Los Angeles County places scraper

    Fix for Los Angeles County places scraper

    The script in los-angeles.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    try:
        assert not len(df) < 342
    except AssertionError:
        raise AssertionError("L.A. County's scraper is missing rows")
    ------------------
    
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2252/2999864733.py in <module>
          1 try:
    ----> 2     assert not len(df) < 342
          3 except AssertionError:
    
    AssertionError: 
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2252/2999864733.py in <module>
          2     assert not len(df) < 342
          3 except AssertionError:
    ----> 4     raise AssertionError("L.A. County's scraper is missing rows")
    
    AssertionError: L.A. County's scraper is missing rows
    AssertionError: L.A. County's scraper is missing rows
    
    

    A branch has been created automatically at fix-los-angeles-1673224242.884841. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 0
  • Fix for Los Angeles County places scraper

    Fix for Los Angeles County places scraper

    The script in los-angeles.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    try:
        assert not len(df) < 342
    except AssertionError:
        raise AssertionError("L.A. County's scraper is missing rows")
    ------------------
    
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2177/2999864733.py in <module>
          1 try:
    ----> 2     assert not len(df) < 342
          3 except AssertionError:
    
    AssertionError: 
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2177/2999864733.py in <module>
          2     assert not len(df) < 342
          3 except AssertionError:
    ----> 4     raise AssertionError("L.A. County's scraper is missing rows")
    
    AssertionError: L.A. County's scraper is missing rows
    AssertionError: L.A. County's scraper is missing rows
    
    

    A branch has been created automatically at fix-los-angeles-1673209203.825798. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 0
  • Fix for Los Angeles County places scraper

    Fix for Los Angeles County places scraper

    The script in los-angeles.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    try:
        assert not len(df) < 342
    except AssertionError:
        raise AssertionError("L.A. County's scraper is missing rows")
    ------------------
    
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2217/2999864733.py in <module>
          1 try:
    ----> 2     assert not len(df) < 342
          3 except AssertionError:
    
    AssertionError: 
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2217/2999864733.py in <module>
          2     assert not len(df) < 342
          3 except AssertionError:
    ----> 4     raise AssertionError("L.A. County's scraper is missing rows")
    
    AssertionError: L.A. County's scraper is missing rows
    AssertionError: L.A. County's scraper is missing rows
    
    

    A branch has been created automatically at fix-los-angeles-1673194907.363041. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 0
  • Fix for Los Angeles County places scraper

    Fix for Los Angeles County places scraper

    The script in los-angeles.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    try:
        assert not len(df) < 342
    except AssertionError:
        raise AssertionError("L.A. County's scraper is missing rows")
    ------------------
    
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2260/2999864733.py in <module>
          1 try:
    ----> 2     assert not len(df) < 342
          3 except AssertionError:
    
    AssertionError: 
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2260/2999864733.py in <module>
          2     assert not len(df) < 342
          3 except AssertionError:
    ----> 4     raise AssertionError("L.A. County's scraper is missing rows")
    
    AssertionError: L.A. County's scraper is missing rows
    AssertionError: L.A. County's scraper is missing rows
    
    

    A branch has been created automatically at fix-los-angeles-1673180570.916919. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 0
  • Fix for Los Angeles County places scraper

    Fix for Los Angeles County places scraper

    The script in los-angeles.ipynb has failed.

    Here is what went wrong:

    An error occurred while executing the following cell:
    ------------------
    try:
        assert not len(df) < 342
    except AssertionError:
        raise AssertionError("L.A. County's scraper is missing rows")
    ------------------
    
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2248/2999864733.py in <module>
          1 try:
    ----> 2     assert not len(df) < 342
          3 except AssertionError:
    
    AssertionError: 
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    /tmp/ipykernel_2248/2999864733.py in <module>
          2     assert not len(df) < 342
          3 except AssertionError:
    ----> 4     raise AssertionError("L.A. County's scraper is missing rows")
    
    AssertionError: L.A. County's scraper is missing rows
    AssertionError: L.A. County's scraper is missing rows
    
    

    A branch has been created automatically at fix-los-angeles-1673166129.110842. Please push your fixes there and when it's fixed alert the Slack channel.

    opened by github-actions[bot] 0
Owner
Los Angeles Times Data and Graphics Department
Reporting, editing, computer programming
Los Angeles Times Data and Graphics Department
Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

recipe-scrapers-webservice This is a wrapper for hhursev/recipe-scrapers which provides the api as a webservice, to be consumed as a microservice by o

null 1 Jul 9, 2022
A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Alex Papadopoulos 1 Nov 13, 2021
A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

GNews ?? A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response ?? As well as you can fetch full

Muhammad Abdullah 273 Dec 31, 2022
Luis M. Capdevielle 1 Jan 14, 2022
robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser.

RoboBrowser: Your friendly neighborhood web scraper Homepage: http://robobrowser.readthedocs.org/ RoboBrowser is a simple, Pythonic library for browsi

Joshua Carp 3.7k Dec 27, 2022
Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

searchcve Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs. Generates a CSV file in the current directory. Uses the NI

null 32 Oct 10, 2022
Here I provide the source code for doing web scraping using the python library, it is Selenium.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

M Khaidar 1 Nov 13, 2021
Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

null 2.3k Jan 4, 2023
A Powerful Spider(Web Crawler) System in Python.

pyspider A Powerful Spider(Web Crawler) System in Python. Write script in Python Powerful WebUI with script editor, task monitor, project manager and

Roy Binux 15.7k Jan 4, 2023
Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

Scrapy project 45.5k Jan 7, 2023
:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

Mort Yao 46.4k Jan 3, 2023
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 8, 2023
A Smart, Automatic, Fast and Lightweight Web Scraper for Python

AutoScraper: A Smart, Automatic, Fast and Lightweight Web Scraper for Python This project is made for automatic web scraping to make scraping easy. It

Mika 4.8k Jan 4, 2023
Async Python 3.6+ web scraping micro-framework based on asyncio

Ruia ??️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame

howie.hu 1.6k Jan 1, 2023
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

Adrien Barbaresi 704 Jan 6, 2023
Web Content Retrieval for Humans™

Lassie Lassie is a Python library for retrieving basic content from websites. Usage >>> import lassie >>> lassie.fetch('http://www.youtube.com/watch?v

Mike Helmick 570 Dec 19, 2022
🥫 The simple, fast, and modern web scraping library

About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I

Max Humber 692 Dec 22, 2022
Transistor, a Python web scraping framework for intelligent use cases.

Web data collection and storage for intelligent use cases. transistor About The web is full of data. Transistor is a web scraping framework for collec

BOM Quote Manufacturing 212 Nov 5, 2022
Html Content / Article Extractor, web scrapping lib in Python

Python-Goose - Article Extractor Intro Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a

Xavier Grangier 3.8k Jan 2, 2023