A free and powerful system for awareness and research of the American judicial system.

Overview

CourtListener

Started in 2009, CourtListener.com is the main initiative of Free Law Project. The goal of CourtListener.com is to provide high quality data and services.

What's Here

This repository is organized in the following way:

  • cl: the Django code for this project. 99% of everything is in this directory.
  • docker: Where to find compose files and docker files for various components.
  • scripts: logrotate, systemd, etc, and init scripts for our various configurations and daemons.

Architecture

Getting Involved

If you want to get involved send us an email with your contact info or take a look through the issues list. There are innumerable things we need help with, but we especially are looking for help with:

  • legal research in order to fix data errors or other problems (check out the data-quality label for some starting points)
  • fixing bugs and building features (most things are written in Python)
  • machine learning or natural language problems.
  • test writing -- we always need more and better tests

In general, we're looking for all kinds of help. Get in touch if you think you have skills we could use or if you have skills you want to learn by improving CourtListener.

Contributing code

See the developer guide.

Copyright

All materials in this repository are copyright Free Law Project under the Affero GPL. See LICENSE.txt for details.

Contact

To contract Free Law Project, see here:

https://free.law/contact/

                   ````
            .:+oo++//++osso+/. -+++////+++.
         -+ys/-`         ./yy+  `./mmmm/``
       -sys:               `oo     ymmy
      +yyo`                 `+`    ymmy
     +yyy`                         ymms
    -yyy+                          ymms
    +yyy:                          ymms
    +sss:                          ymms
    /sss+                          ydds
    `ssss.                         sdds
     -syyo`                  ``    sdds
      .oyys-                `s/    ydds            `+`
        :shhs:`           `/ys`    yddh`          .hs
          .:oyys+:-....-/oyys.  `./ddddy/:--.---:odd.
              `.-::///::-.`    -///////////////////-
Comments
  • Add NDAs support for recap email

    Add NDAs support for recap email

    This adds support to parse metadata and download free documents from NDAs.

    This also solves #2238. There are some notifications (NEF or NDA) that don't contain an attached document so we can't get the pacer_case_id and pacer_doc_id since pacer_case_id is mandatory to save a docket and pacer_doc_id to save the recap document in case there is not attached document we just ignore the notification and mark with the status message: Not a valid notification email. No message content.

    In order to handle the free look document it was needed to add a new field to the data retrieved by S3NotificationEmail: email_notice_type it could be "NEF" or "NDA".

    So the data for an NDA is as below:

    {
      "case_name": "New York State Telecommunicati v. James",
      "court_id": "ca2",
      "date_filed": "2022-03-23",
      "docket_entries": [
        {
          "date_filed": "2022-03-23",
          "description": "REPLY BRIEF, on behalf of Appellant Letitia A. James, FILED. Service date 03/23/2022 by CM/ECF. [3283515] [21-1975]",
          "document_number": "00208754210",
          "document_url": "https://ecf.ca2.uscourts.gov/docs1/00208754210?uid=59a7f7615f78153b",
          "pacer_case_id": "21-1975",
          "pacer_doc_id": "00208754210",
          "pacer_magic_num": "59a7f7615f78153b",
          "pacer_seq_no": null
        }
      ],
      "docket_number": "21-1975",
      "email_notice_type": "NDA",
      "email_recipients": [
        {
          "email_addresses": [
            "[email protected]"
          ],
          "name": "Recap email recipient, -:"
        }
      ]
    }
    
    

    Since NDAs are notifications from Appellate Courts, I based on appellate examples on CL to assign the fields.

    The document_number is the same of pacer_doc_id The pacer_case_id is the same of docket_number

    Within the NDA examples added in Juriscraper, there's one that doesn't contain email addresses: Screen Shot 2022-08-12 at 10 39 56

    Since we get the email recipients from here to send docket alerts for the first time. In this case, we get the metadata and download the document but we don't send the docket alert. I think we have an option here which is to send the docket alert to the recap.email that is in the lambda json receipt but this will only work if the notification only contains one recap.email address. If two or more recap.email users add their recap.email address to the same case and email addresses are not appended within the email body we won't be able to send the docket alerts to all of the recap.email users (only to the user that we receive the email notification for the first time).

    What do you think about this?

    Once the Juriscraper PR is merged I could update the Juriscraper version here.

    opened by albertisfu 33
  • Upload RECAP content to Internet Archive

    Upload RECAP content to Internet Archive

    The end of quarter is fast approaching, and we need to upload everything that's changed this quarter in the RECAP Archive. For a sense of scale, as of now, that's:

    • About 100k dockets:

       Docket.objects.filter(date_modified__gte='2017-09-01', source__in=Docket.RECAP_SOURCES).count()`)
      
    • About 2.9M docket entries:

       DocketEntry.objects.filter(date_modified__gte='2017-09-01').count()
      
    • About 4.7M document metadata files:

       RECAPDocument.objects.filter(date_modified__gte='2017-09-01').count()
      

      (This number is insanely high this quarter because of issue #774, which scraped millions of IDs we formerly lacked.)

    • An unknown number of actual PDFs, but it's probably something in the hundreds of thousands. Somehow we need to figure out which PDFs were uploaded during this quarter. Oddly, I don't think we store this in the database. We may have to use the creation date of the files themselves.

    This is...a lot to upload. We've got a few options:

    1. Try to do it as similarly as possible to the old format: One XML or JSON file per docket, and then a bunch of documents.

      Pros: A reasonable number of files. Familiar format. Cons: Need to render entire dockets even if only one item on it changed. Rendering big dockets can take a very long time. Doesn't create versioned snapshots of the data. Questions: How will people know what's new? RSS feeds certainly won't work.

    2. Just generate and upload a single tar file per quarter or one per court.

      Pros: Fairly easy to work with for consumers. Allows us to later make a tar of literally everything, if we have a need. Don't have to render complete dockets. Creates versioned snapshots of the data. Only one file to upload. Cons: Not the format people are familiar with. One massive file can be hard to work with (you have to download the whole thing to see what's inside, for example). Generating this kind of file takes space locally, and the process can fail. Questions: Do people care about getting per-court files? Is it worth making a sample file, with, say, 1000 items?

    3. Upload one JSON file per changed object type, and put them all in a directory on IA. For example, upload the docket (which has metadata), the

      Pros: Easier to consume. Closer to current format. Cons: A LOT of small files each quarter. Might not be feasible to even upload like this in a reasonable time frame. Probably not possible to know what's part of the latest dump. No versioning of files.

    4. Database dump of changed data + tar of PDFs.

      Pros: Fairly easy, I think, to generate, probably faster than generating JSON. Cons: Dump might include fields that shouldn't be shared. Not a super useful format.

    Having walked through these, I think I'm leaning towards generating one file per quarter and uploading that. It'll be a big change for consumers, but I think it's a reasonable way forward. It provides the highest fidelity of the data, is a reasonable number of files to upload to IA, will be taxing but not horrid on our CPU (I hope), and will be somewhat easy to work with. It'll also provide a clear, "This is what changed" statement for consumers, which I think is something that's been lacking in the past.

    I'd love more thoughts on this from the community.

    opened by mlissner 31
  • Add links to PacerDash for any RECAP doc that we don't have

    Add links to PacerDash for any RECAP doc that we don't have

    PacerDash came up with a nice idea to make it possible for people to purchase items via their website and then feed them back into ours. Here's an example:

    https://www.pacerdash.com/court-listener-checkouts/181127929787

    All we need to do is link to these from our dockets when we don't have the item. Right now we have simple download buttons that lead to PACER. We can just add a little drop down to the button that leads to PacerDash, probably with a link that says, "Buy on PacerDash"

    opened by mlissner 30
  • Set up restic for offsite backups

    Set up restic for offsite backups

    I've been reviewing backup solutions for a few days now, and restic has really caught my eye. Some of the nice things it does includes:

    • Incremental backups
    • Encrypted backups (with sign off from Filippo the encryption guy)
    • Mountable backups via fuse
    • Simple binary. Installation is just a wget.
    • deduplication across files and servers
    • ability to receive data from a pipe
    • works with backblaze, which appears to be the cheapest option
    • snapshots are easy to prune once old
    • active community
    • good docs

    All in all, it looks pretty great.

    So, setting it up involves probably a few things:

    • [x] Create a new restic user using the instructions in the docs

    • [x] Get the binary for that user and add it to the user's path.

    • [x] Establish an account at backblaze and set it up with a credit card

    • [x] Set up the backups (uses an init command, I think)

    • [x] Set up cronjobs for daily/weekly backups

    • [x] Set up cronjob for database dumps (weekly, probably, these are in the low hundreds of GB in size, probably)

    • [x] Kick it off.

    • [ ] Add a check command from time to time to ensure backup integrity

    • [x] Use nice and ionice (or whatever it's called) to make it low priority

    • [x] Ensure that we don't do the double file scan thing in the script. (There's a flag for this to dig up.)

    opened by mlissner 30
  • Figure out future of bulk data files

    Figure out future of bulk data files

    Right now they're generated on disk as millions of JSON files, then zipped up, and finally, the zip file is swapped in for the old one that used to be in nginx. It'd be nice to serve these from S3 instead, but the "zipped up" part is hard if you do that.

    S3 is pretty easy to copy from using lots of simple tools, and I think it's reasonable to ask people to download millions of JSON files from S3 instead of bundled zip files. At the end of the day, they want JSON files anyway, not zips. It also makes it a LOT easier for us to generate bulk data. Instead of doing things like making zips and swapping old zips for new ones, we just send the JSON to S3 and replace the version that was there before.

    The other thing to do with this is to make sure we make our generation process resumable. If it crashes, it needs to know where it last was and pick up from that point.

    opened by mlissner 28
  • Implement throttling on critical views

    Implement throttling on critical views

    Some ass is crawling the hell out of us right now and we need to get them redirected to the API. It looks like they're only crawling the dockets, so I'm going to start by fixing that by using the library here:

    https://django-ratelimit.readthedocs.io/en/v1.0.0/index.html

    I guess somebody was going to do it. We've got some honeypots in the data that we can use someday to figure out who's the culprit, but in the meantime, this is annoying. We do have an API.

    opened by mlissner 28
  • Upgrade postgres to latest feasible version

    Upgrade postgres to latest feasible version

    We're currently using postgres 9.3, but for a client we want to use AWS's migration/replication feature over at:

    https://aws.amazon.com/dms/

    (They want to have unfettered access to our latest DB.)

    That requires that we have at least version 9.4, so we have to at least upgrade to that.

    The Postgres folks kindly have an apt repository that supports our version of Ubuntu, so this shouldn't actually be too terrible. Last time we did this, we documented our work over here: https://github.com/freelawproject/courtlistener/issues/288

    A couple questions, though:

    1. Is there any reason not to upgrade all the way to the latest version?
    2. Do we need to run pg_upgradecluster for every point version we want to upgrade, or can we just do it once and skip a bunch of steps?
    3. How much downtime can we expect for this?
    opened by mlissner 27
  • RFC: Which cases should be available in public search engines?

    RFC: Which cases should be available in public search engines?

    We've been getting a lot of removal requests lately, and it seems worth it to take a slightly less open approach to publishing.

    Current proposals for removal from public search are:

    1. Non-precedential cases. This currently is about 500,000 cases.
    2. Cases from certain jurisdictions or courts that are more likely to have sensitive cases.
    3. Any case mentioning certain words (like "asylum", for example)

    We can make these rules as complex as we can dream up, but the idea here is to strike a better line between "publish everything in the name of openness" and "people have legit privacy rights".

    For example, looking at the above, I don't see a lot of benefit to publishing non-precedential cases — they're likely to be small-time, nobody is likely to look directly for them, and the people that these pages do attract to CourtListener probably didn't want to land there anyway. OTOH, some of the people in these cases are legitimately bad people, like fraudulent accountants and child molesters. OTOH again, some of these people are trying to move on with their lives and have served their time. Are we more moral to hide these or to show them? It's not altogether clear.

    Currently, we only hide cases that:

    1. Have a social security number (which we also X-out), or
    2. Have an alien ID number, or
    3. Emailed to have their case blocked.

    There are currently about 2000 cases that we've blocked through manual or automated means.

    I'd be very interested in a discussion of how we might improve our approach to this problem.

    opened by mlissner 26
  • Flynt f string fixes

    Flynt f string fixes

    This PR accomplishes two things:

    1. It moves all of our existing CourtListener code over to f-strings.
    2. It adds enforcement of those f-strings via pre-commit and a Github Action

    No more old fashioned strings in the code base nor going forward.

    opened by mlissner 24
  • Track last date updated for alerts

    Track last date updated for alerts

    First stab at https://github.com/freelawproject/courtlistener/issues/1264. What this does: in cl/alerts/tasks.py, when a task checking for new docket entries on an alert doesn't find any, it goes out and scrapes the date_last_filing field from PACER (if it hadn't been updated in the last hour). If that date is more recent than the last date the alert was triggered, then trigger the alert with a generic message.

    I could use some help testing this.

    opened by ikeboy 23
  • Sensitive data in opinions

    Sensitive data in opinions

    I ran Google's Data Loss Prevention (DLP) API on the opinions table.

    The results are in a table which Mike has access to.

    Here're the overall stats, sifted down to exclude hits for states and exclude obvious false positive fields like dates and sha1. "prob" is the DLP API's likelihood rating. "ops" is # unique opinions. "n/op" is # separate hits per opinion (which can double count eg between the text and various html fields; I've not tried to dedupe that).

    From spot checking, it seems like about half of these hits are legit. It'll need manual verification.

    A lot of this seems to me to be stuff that really ought to be redacted — preferably at the source (i.e. the courts). I'm not sure how to go about getting them to do that. Maybe talk to the Judicial Conference or AOUSC? Or the various circuits' administrative arms?

    Also, the people affected should probably be contacted (and surveyed) before this is publicly disclosed.

    ops	n/op	prob	type
    13	8.31	5	AMERICAN_BANKERS_CUSIP_ID
    6	3.33	4	AMERICAN_BANKERS_CUSIP_ID
    4846	2.4	3	AMERICAN_BANKERS_CUSIP_ID
    405	2.13	2	AMERICAN_BANKERS_CUSIP_ID
    91	1.95	3	AUSTRALIA_MEDICARE_NUMBER
    542	2.09	2	AUSTRALIA_MEDICARE_NUMBER
    253	2.16	1	AUSTRALIA_MEDICARE_NUMBER
    549	3.89	3	AUSTRALIA_TAX_FILE_NUMBER
    3268	5.49	2	AUSTRALIA_TAX_FILE_NUMBER
    703	2.22	1	AUSTRALIA_TAX_FILE_NUMBER
    216	7.4	2	BRAZIL_CPF_NUMBER
    74	2.61	4	CANADA_BC_PHN
    4448	2.81	3	CANADA_BC_PHN
    591	2.6	2	CANADA_BC_PHN
    2	2	3	CANADA_OHIP
    363	2.01	2	CANADA_OHIP
    615	2.75	1	CANADA_OHIP
    1	2	4	CANADA_PASSPORT
    12941	3.42	1	CANADA_PASSPORT
    876	7.94	4	CANADA_QUEBEC_HIN
    1	4	2	CANADA_QUEBEC_HIN
    1936	2.75	3	CANADA_SOCIAL_INSURANCE_NUMBER
    338	2.87	2	CANADA_SOCIAL_INSURANCE_NUMBER
    612	2.69	1	CANADA_SOCIAL_INSURANCE_NUMBER
    1	3	2	CHINA_PASSPORT
    22	2.77	1	CHINA_PASSPORT
    47	4.98	5	CREDIT_CARD_NUMBER
    188	3.35	4	CREDIT_CARD_NUMBER
    443	3.03	3	CREDIT_CARD_NUMBER
    248	1.5	2	CREDIT_CARD_NUMBER
    153	2.8	1	CREDIT_CARD_NUMBER
    14800	2.32	5	EMAIL_ADDRESS
    74	1.77	4	EMAIL_ADDRESS
    162	1.64	3	EMAIL_ADDRESS
    1	4	4	FRANCE_PASSPORT
    6069	2.83	2	FRANCE_PASSPORT
    762	2.41	1	FRANCE_PASSPORT
    1	2	5	IBAN_CODE
    5	2	3	IBAN_CODE
    348	2.35	2	IBAN_CODE
    4	1.5	1	IBAN_CODE
    1	2	5	IMEI_HARDWARE_ID
    63	2.02	4	IMEI_HARDWARE_ID
    737	2.5	3	IMEI_HARDWARE_ID
    117	2.75	2	IMEI_HARDWARE_ID
    15	1.67	4	INDIA_PAN_INDIVIDUAL
    210	4.18	5	IP_ADDRESS
    2441	6.89	4	IP_ADDRESS
    10	4.3	3	IP_ADDRESS
    3	4	2	IP_ADDRESS
    73	4.81	1	JAPAN_INDIVIDUAL_NUMBER
    2	2	4	JAPAN_PASSPORT
    50	2.16	2	JAPAN_PASSPORT
    1089	3.59	1	JAPAN_PASSPORT
    1	2	4	KOREA_PASSPORT
    51	2.16	2	KOREA_PASSPORT
    1116	3.59	1	KOREA_PASSPORT
    1	1	3	KOREA_RRN
    30	2.03	2	KOREA_RRN
    2	2	1	KOREA_RRN
    1	6	5	MAC_ADDRESS
    23	2.43	2	MAC_ADDRESS
    1	2	1	MAC_ADDRESS
    17	2.06	2	MAC_ADDRESS_LOCAL
    21459	7.24	1	MEXICO_PASSPORT
    2	2	5	NETHERLANDS_BSN_NUMBER
    1	2	4	NETHERLANDS_BSN_NUMBER
    2126	2.53	2	NETHERLANDS_BSN_NUMBER
    14300	2.66	1	NETHERLANDS_BSN_NUMBER
    2837	3.38	5	PHONE_NUMBER
    18906	4.59	4	PHONE_NUMBER
    181491	6.01	3	PHONE_NUMBER
    512566	5.29	2	PHONE_NUMBER
    2113	2.87	1	PHONE_NUMBER
    18	2.5	3	SPAIN_NIF_NUMBER
    1	2	3	SPAIN_PASSPORT
    11808	3.49	2	SPAIN_PASSPORT
    12817	3.43	1	SPAIN_PASSPORT
    32	2.06	5	SWIFT_CODE
    4088	2.4	4	SWIFT_CODE
    1	2	4	UK_DRIVERS_LICENSE_NUMBER
    83	4.75	4	UK_NATIONAL_HEALTH_SERVICE_NUMBER
    92	5.08	3	UK_NATIONAL_HEALTH_SERVICE_NUMBER
    39	2.15	2	UK_NATIONAL_HEALTH_SERVICE_NUMBER
    302	2.79	4	UK_NATIONAL_INSURANCE_NUMBER
    83567	2.84	3	UK_NATIONAL_INSURANCE_NUMBER
    1172	3.27	2	UK_NATIONAL_INSURANCE_NUMBER
    4	1.5	1	UK_NATIONAL_INSURANCE_NUMBER
    5	3.6	4	UK_PASSPORT
    5158	2.95	1	UK_PASSPORT
    4861	2.84	1	UK_TAXPAYER_REFERENCE
    25	2.44	5	US_BANK_ROUTING_MICR
    867	2.21	3	US_BANK_ROUTING_MICR
    53	3.68	2	US_BANK_ROUTING_MICR
    7	2.57	5	US_DEA_NUMBER
    23	3.48	4	US_DEA_NUMBER
    499	2.48	5	US_DRIVERS_LICENSE_NUMBER
    8431	1.6	4	US_DRIVERS_LICENSE_NUMBER
    81866	3.85	2	US_DRIVERS_LICENSE_NUMBER
    1224	3.69	1	US_DRIVERS_LICENSE_NUMBER
    853	2.26	4	US_HEALTHCARE_NPI
    294	7.07	4	US_PASSPORT
    5082	2.58	1	US_PASSPORT
    40	2.35	5	US_SOCIAL_SECURITY_NUMBER
    11	2.09	4	US_SOCIAL_SECURITY_NUMBER
    123	2.28	3	US_SOCIAL_SECURITY_NUMBER
    3355	2.53	2	US_SOCIAL_SECURITY_NUMBER
    1110	3.76	1	US_SOCIAL_SECURITY_NUMBER
    235	2.72	5	US_TOLLFREE_PHONE_NUMBER
    2070	3.17	4	US_TOLLFREE_PHONE_NUMBER
    18	2.5	3	US_TOLLFREE_PHONE_NUMBER
    91	3.91	2	US_TOLLFREE_PHONE_NUMBER
    681	3.89	5	US_VEHICLE_IDENTIFICATION_NUMBER
    70	2.57	4	US_VEHICLE_IDENTIFICATION_NUMBER
    
    opened by saizai 22
  • 2382 Add webhook event for expiring docket alerts

    2382 Add webhook event for expiring docket alerts

    This PR adds a new webhook event for expiring docket alerts.

    The payload has the following structure:

      "payload": {
        "old_alerts": [
          {
            "id": 1,
            "date_created": "2022-09-23T19:53:36.903277-07:00",
            "date_last_hit": null,
            "secret_key": "ehT7V9rmnBNIOV6rTMmMH0x6EvxeA0nYXfpN3Ks3",
            "alert_type": 1,
            "docket": 1
          }
        ],
        "very_old_alerts": [],
        "disabled_alerts": []
      },
    {
      "webhook": {
        "event_type": 4,
        "version": 1,
        "date_created": "2022-12-28T06:20:47.010040+00:00",
        "deprecation_date": null
      }
    }
    

    I tweaked a bit the OldAlertReport class in order to have the same property names (email templates updated accordingly) that the webhook payload and also to store the docket alert objects (needed for the webhook event) to avoid additional db queries.

    • Added webhook test event using generic data for docket alerts.

    When adding tests I found a conflict: the report deleted docket alerts to disable them and also considered all existing docket alerts as enabled instead of using the "Unsubscription" and "Subscription" alert_type. So I applied the following changes in order to fix it:

    • Use the SUBSCRIPTION alert_type to filter enabled docket alerts

    • When automatically disabling alerts instead of deleting the alert, now set the alert_type toUNSUBSCRIPTION

    • Adding the previous change I noticed a new problem, we can't calculate the Docket Alert age since they are not deleted when are disabled. To solve this problem and continue using the date_created field to calculate the alert age, I added some code to the DocketAlert save() method so we can update the date_created field when the docket alert_type is toggled. So that if a user disables and re-enables a docket alert the date_created is updated to the current date and the alert keeps running for another 180 days.

    • I had to modify the toggle_docket_alert used in frontend to use the save() method instead of doing a queryset update (it doesn't trigger the save() method)

    • I also confirmed (and added a test) that by doing an update or patch to the Docket Alert API the date_created is updated if the alert_type is changed.

    • Added documentation for this new webhook event type.

    Let me know what you think.

    opened by albertisfu 0
  • 2273 Added webhook-sentry proxy to send webhooks securely

    2273 Added webhook-sentry proxy to send webhooks securely

    I've added the configuration to use webhook-sentry as a proxy when sending webhook events.

    • Added juggernaut/webhook-sentry:latest image to our docker-compose so it's available in dev and testing.
    • Added some tests to confirm the internal IPs are blocked
    • Increased the request timeout to 3 seconds since the proxy would add some additional delay, using 1 second I was getting some timeout errors.

    I found an incompatibility using the proxy and ngrok, the request is received but it arrives without content, I'll dig in a bit more to see if is a problem when using any proxy and ngrok or something specific to webhook-sentry.

    Now to test webhooks are sent properly through the proxy I'm using: https://webhook.site/ it generates a URL where you can check the webhook request.

    To test protection against SSRF you can use: https://make-127-0-0-1-rr.1u.ms

    opened by albertisfu 2
  • Issues parsing 2nd/9th Circuit cases

    Issues parsing 2nd/9th Circuit cases

    Hello, over the past few days I've run into an issue querying for 9th Circuit cases (2nd circuit and 9th BAP as well; other circuits are working just fine). It looks like a new redirection page is not being handled correctly.

    Running this:

        court_id="9ca"
        docket_number="22-650"
        session = PacerSession(username="<username>", password="<password>")
        session.login()
        report = AppellateDocketReport(court_id, session)
    
        report.query(docket_number)
    

    Results in this error:

    Juriscraper will continue to run, and all logs will be sent to stderr.
    2022-12-19 15:50:25,521 - INFO: Attempting PACER API login
    2022-12-19 15:50:26,544 - INFO: New PACER session established.
    2022-12-19 15:50:26,544 - INFO: Querying appellate docket report for docket number '22-650' with params {'servlet': 'CaseSummary.jsp', 'caseNum': '22-650', 'incDktEntries': 'Y', 'incPtyAty': 'Y', 'fullDocketReport': 'Y', 'actionType': 'Run+Docket+Report'}
    2022-12-19 15:50:33,921 - INFO: Invalid/expired PACER session. Establishing new session.
    2022-12-19 15:50:33,922 - INFO: Attempting PACER API login
    2022-12-19 15:50:35,115 - INFO: New PACER session established.
    Traceback (most recent call last):
      File "/Users/andrew/dev/docket/python/pacer/main.py", line 63, in <module>
        query_case(docket_number, court_id, outfile, get_docket_entries, get_parties, get_lower_court, date_start)
      File "/Users/andrew/dev/docket/python/pacer/main.py", line 33, in query_case
        data['metadata'] = report.metadata
      File "/usr/local/lib/python3.9/site-packages/juriscraper/pacer/appellate_docket.py", line 343, in metadata
        "case_name": self._get_case_name(),
      File "/usr/local/lib/python3.9/site-packages/juriscraper/pacer/appellate_docket.py", line 615, in _get_case_name
        case_name = self.tree.xpath(path)[0].text_content()
    IndexError: list index out of range
    

    If you inspect the response from pacer it looks like this:

    <!DOCTYPE html>
    <html>
        <head>
            <meta charset="utf-8" />
                </head>
        <body onload="document.forms[0].submit()">
            <noscript>
                <p>
                    <strong>Note:</strong> Since your browser does not support JavaScript,
                    you must press the Continue button once to proceed.
                </p>
            </noscript>
            
            <form action="https&#x3a;&#x2f;&#x2f;ca9-showdoc.azurewebsites.us&#x2f;Saml2&#x2f;Acs" method="post">
                <div>
                    <input type="hidden" name="RelayState" value="vBGZJQKeTnZRiK67sW7q5XYU"/>                
                                    
                    <input type="hidden" name="SAMLResponse" value="<REMOVED>"/>                
                </div>
                <noscript>
                    <div>
                        <input type="submit" value="Continue"/>
                    </div>
                </noscript>
            </form>
                </body>
    </html>
    

    I don't have too much experience with juriscraper, wondering if this happens somewhere else in pacer.

    opened by andrewbaker00 1
  • XXX being used as a placeholder page value in Nebraska

    XXX being used as a placeholder page value in Nebraska

    @mlissner @quevon24

    This one is new to me

    https://www.courtlistener.com/c/neb/293/xxx/

    While investigating the Nebraska issue, or more accurately a method to clean it up, I stumbled onto something new. We currently have citations in the system for placeholder values provided by the Supreme Court of Nebraska that I had not seen before.

    This is obviously something we need to clean up, but more research should be done to see if this is an uncommon practice or not.

    opened by flooie 1
  • Add webhook event for expiring docket alerts on terminated cases

    Add webhook event for expiring docket alerts on terminated cases

    When a case is terminated, we stop checking it for updates after a certain period of time, and we send an email to tell the user that their case is going to stop getting updates eventually.

    Most of that logic is in https://github.com/freelawproject/courtlistener/blob/main/cl/alerts/management/commands/handle_old_docket_alerts.py#L93-L112

    Which has this snippet explaining its purpose:

    class Command(VerboseCommand):
        help = """Check for old docket alerts and email or disable them.
    Alerts are sent weekly, therefore we have to capture things in date ranges.
    This prevents us from sending too many notifications or not enough.
    The schedule is thus:
         Day 0 ─┬─ Item is terminated
                │
                │
       T+90-96 ─┼─ Item terminated about 90 days ago,
                │  did you want to disable it?
                │
     T+180-186 ─┼─ Item terminated about 180 days ago, alerts will be disabled
                │  in one week if you do not disable and re-enable your alert.
                │
       T+187-∞ ─┴─ Item terminated more than 180 days ago and
                   alert is disabled.
    """
    

    We should create a new webhook event for this that can be interpreted as a warning that the termination is coming or that says the termination has happened.

    When we write the documentation for this, we should note that if you don't want to stop getting updates for a case (ie, you don't want it to expire), you can re-up the subscription by deleting it and recreating it..

    The warning email currently gives this advice:

    Please disable and re-enable these alerts if you are still monitoring these cases. Doing so will keep them running another 180 days.

    opened by mlissner 5
Owner
Free Law Project
We provide free access to primary legal materials, develop legal research tools, and support academic research on legal corpora.
Free Law Project
Research using python - Guide for development of research code (using Anaconda Python)

Guide for development of research code (using Anaconda Python) TL;DR: One time s

Ziv Yaniv 1 Feb 1, 2022
UdemyPy is a bot that hourly looks for Udemy free courses and post them in my Telegram Channel: Free Courses.

UdemyPy UdemyPy is a bot that hourly looks for Udemy free courses and post them in my Telegram Channel: Free Courses. How does it work? For publishing

null 88 Dec 25, 2022
an opensourced roblox group finder writen in python 100% free and virus-free

Roblox-Group-Finder an opensourced roblox group finder writen in python 100% free and virus-free note : if you don't want install python or just use w

mollomm1 1 Nov 11, 2021
Gba-free-fonts - Free font resources for GBA game development

gba-free-fonts Free font resources for GBA game development This repo contains m

null 28 Dec 30, 2022
HatAsm - a HatSploit native powerful assembler and disassembler that provides support for all common architectures

HatAsm - a HatSploit native powerful assembler and disassembler that provides support for all common architectures.

EntySec 8 Nov 9, 2022
Pattern Matching for Python 3.7+ in a simple, yet powerful, extensible manner.

Awesome Pattern Matching (apm) for Python pip install awesome-pattern-matching Simple Powerful Extensible Composable Functional Python 3.7+, PyPy3.7+

Julian Fleischer 97 Nov 3, 2022
Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

DrawBot DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics. The built-in

Frederik Berlaen 342 Dec 27, 2022
Mnemosyne: efficient learning with powerful digital flash-cards.

Mnemosyne: Optimized Flashcards and Research Project Mnemosyne is: a free, open-source, spaced-repetition flashcard program that helps you learn as ef

null 359 Dec 24, 2022
A Powerful Tool For Making Combo List(All possible modes)

ComboMaker A Powerful Tool For Making Combo List Introduction Check out all possible Combo list build modes with this tool =) How to Install Open the

MasterBurnt 7 Jan 7, 2023
Powerful virtual assistant in python

Virtual assistant in python Powerful virtual assistant in python Set up Step 1: download repo and unzip Step 2: pip install requirements.txt (if py au

Arkal 3 Jan 23, 2022
Powerful Assistant

Delta-Assistant Hi I'm Phoenix This project is a smart assistant This is the 1.0 version of this project I am currently working on the third version o

null 1 Nov 17, 2021
Calc.py - A powerful Python REPL calculator

Calc - A powerful Python REPL calculator This is a calculator with a complex sou

Alejandro 8 Oct 22, 2022
Scitizen - Help scientific research for the benefit of mankind and humanity 🔬

Scitizen - Help scientific research for the benefit of mankind and humanity ?? Scitizen has been built from the ground up to give everyone the possibi

Pierre CORBEL 21 Mar 8, 2022
A web app for presenting my research in BEM(building energy model) simulation

BEM(building energy model)-SIM-APP The is a web app presenting my research in BEM(building energy model) calibration. You can play around with some pa

null 8 Sep 3, 2021
Research on how Gboard Stickers work.

Google-Sticker-Mashup-Research Research on how Gboard Stickers work. Contribute Contributing is nice, and you will be listed below for contributing. C

Jeremiah 45 Oct 28, 2022
Automated moth pictures for biodiversity research

Automated moth pictures for biodiversity research

Ludwig Kürzinger 1 Dec 16, 2021
A free and open-source chess improvement app that combines the power of Lichess and Anki.

A free and open-source chess improvement app that combines the power of Lichess and Anki. Chessli Project Activity & Issue Tracking PyPI Build & Healt

null 93 Nov 23, 2022
Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source.

Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source. Getting Started This is an example of ho

null 14 Apr 10, 2022
A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.

deep-translator Translation for humans A flexible FREE and UNLIMITED tool to translate between different languages in a simple way using multiple tran

Nidhal Baccouri 806 Jan 4, 2023