This directory gathers the tools developed by the Data Sourcing Working Group

Overview

BigScience Data Sourcing Code

This directory gathers the tools developed by the Data Sourcing Working Group

First Sourcing Sprint: October 2021

The code for the input form can be found in sourcing_sprint/streamlit_form.py

The code for the exploration tool can be found in sourcing_sprint/streamlit_explore.py

The resource entries can be found in sourcing_sprint/resources (one folder per language, one .jsonl file per resource)

Comments
  • Email stated as optional in the app sidebar but it's required before Save can work

    Email stated as optional in the app sidebar but it's required before Save can work

    Email stated as optional in the app sidebar but it's required before Save can work when the button is clicked. I guess "optional" word should be removed.

    opened by tosingithub 1
  • [Feature request] Link primary sources to existing partners and organizations

    [Feature request] Link primary sources to existing partners and organizations

    In order to reduce confusions and to better differentiate between primary sources and organizations we should provide users with a list of already submitted organization to which these primary sources might belong. This would for instance simplify the PII and the Representative/Owner/Custodian section as we could potentially reuse the information that's already available for an organization in all the primary sources attached to it.

    Examples:

    The groupe Le Monde organization owns the primary sources: Le Monde Journal, M, le magazine du Monde, L'heure du Monde Podcast, Le Monde YouTube, among others.

    The CCSD organization owns the primary sources: HAL, TEL, MédiHAL among others.

    The Wikimedia Foundation owns the primary sources: Wikipedia, Wikbooks, Wiktionary, Wikiquote, Wikinews, among others.

    enhancement help wanted 
    opened by pjox 1
  • [Bug] Error Validating when Primary source type is other

    [Bug] Error Validating when Primary source type is other

    When we tried to validate an entry for which the primary source type is other, for instance, Newspaper, we get the following error: Screen Shot 2021-10-28 at 8 24 11 PM

    ValueError: 'Newspaper' is not in list
    Traceback:
    File "/home/yacine/.local/lib/python3.7/site-packages/streamlit/script_runner.py", line 354, in _run_script
        exec(code, module.__dict__)
    File "/home/yacine/Code/data_sourcing/sourcing_sprint/app.py", line 380, in <module>
        main()
    File "/home/yacine/Code/data_sourcing/sourcing_sprint/app.py", line 99, in main
        pages[app_mode](submission_info_dict)
    File "/home/yacine/Code/data_sourcing/sourcing_sprint/app.py", line 305, in val_page
        form_source_category(entry_dict, app_categories, "val")
    File "/home/yacine/Code/data_sourcing/sourcing_sprint/catalogue/catalogue_forms.py", line 909, in form_source_category
        if mode == "val"
    
    opened by pjox 0
  • [Feature Request] Switching modes with Open Sections

    [Feature Request] Switching modes with Open Sections

    When you switch modes in the form with sections expanded, the text from the open sections on the previous page is squished to the side of the form. You have to go back and hide the sections, then go to the new mode in order for it to not show the text. Is there a way to hide the text automatically when switching modes?

    opened by mcmillanmajora 0
  • [Feature Request] Prompt for Primary Sources of a Processed Dataset when selecting Other

    [Feature Request] Prompt for Primary Sources of a Processed Dataset when selecting Other

    In the section Primary Sources of the Processed Dataset, the question "What kind of primary sources did the data curators use to make this dataset?" has two possible answers for other (other and web|other) that should prompt the submitter to provide an alternate description, but currently don't.

    opened by mcmillanmajora 0
Owner
BigScience Workshop
Research workshop on large language models - The Summer of Language Models 21
BigScience Workshop
SMS-b0mber VANDALIZM developed for VK group

VANDALIZM SMS-b0mber VANDALIZM developed for VK group https://vk.com/dark__code if you come across this code, you can use it for your own purposes) ус

null 5 Jun 24, 2022
Ningyu Jia(nj2459)/Mengyin Ma(mm5937) Call Analysis group project(Group 36)

Group and Section Group 36 Section 001 name and UNI Name UNI Ningyu Jia nj2459 Mengyin Ma mm5937 code explanation Parking.py (1) Calculate the rate of

null 1 Dec 4, 2021
Open source tools to allow working with ESP devices in the browser

ESP Web Tools Allow flashing ESPHome or other ESP-based firmwares via the browser. Will automatically detect the board type and select a supported fir

ESPHome 195 Dec 31, 2022
Python tools for working with Orbit Ephemeris Messages (OEMs).

Python Orbit Ephemeris Message tools Python tools for working with Orbit Ephemeris Messages (OEMs). Development Status Installation The oem package is

Brad Sease 4 Apr 6, 2022
Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory

Data Loader Plugin - Python Table of Content (ToC) Data Loader Plugin - Python Table of Content (ToC) Overview References Python module Python virtual

Cloud Helpers 2 Jan 10, 2022
Certipy is a Python tool to enumerate and abuse misconfigurations in Active Directory Certificate Services (AD CS).

Certipy Certipy is a Python tool to enumerate and abuse misconfigurations in Active Directory Certificate Services (AD CS). Based on the C# variant Ce

ollypwn 1.3k Jan 1, 2023
Python implementation for Active Directory certificate abuse

Certipy is a Python tool to enumerate and abuse misconfigurations in Active Directory Certificate Services (AD CS). Based on the C# variant Ce

Oliver Lyak 1.3k Jan 9, 2023
This Python library searches through a static directory and appends artist, title, track number, album title, duration, and genre to a .json object

This Python library searches through a static directory (needs to match your environment) and appends artist, title, track number, album title, duration, and genre to a .json object. This .json object is then used to post data to a specified table in a local MySQL database, credentials of which the user must set.

Edan Ybarra 1 Jun 20, 2022
Navigate to your directory of choice the proceed as follows

Installation ?? Navigate to your directory of choice the proceed as follows; 1 .Clone the git repo and create a virtual environment Depending on your

Ondiek Elijah Ochieng 2 Jan 31, 2022
User management system (UMS), has the primary purpose of connecting to an Active Directory (AD)

?? Sistema de Gerenciamento de Usuário (SGU) ?? Sobre o projeto Sistema de gerenciamento de usuários (SGU), tem o objetivo primário de se conectar a u

Patrick Viegas 2 Feb 25, 2022
Get a link to the web version of a git-tracked file or directory

githyperlink Get a link to the web version of a git-tracked file or directory. Applies to GitHub and GitLab remotes (and maybe others but those are no

Tomas Fiers 2 Nov 8, 2022
Utility functions for working with data from Nix in Python

Pynixutil - Utility functions for working with data from Nix in Python Examples Base32 encoding/decoding import pynixutil input = "v5sv61sszx301i0x6x

Tweag 11 Dec 16, 2022
Service for working with open data of the State Duma of the Russian Federation

Сервис для работы с открытыми данными Госдумы РФ Исходные данные из API Госдумы РФ извлекаются с помощью Apache Nifi и приземляются в хранилище Clickh

Aleksandr Sergeenko 2 Feb 14, 2022
A Python wrapper API for operating and working with the Neo4j Graph Data Science (GDS) library

gdsclient NOTE: This is a work in progress and many GDS features are known to be missing or not working properly. This repo hosts the sources for gdsc

Neo4j 100 Dec 20, 2022
The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.

The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.

NHS Digital 50 Dec 22, 2022
Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check database.

DVOF_check_tool Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check d

null 3 Apr 18, 2022
Wisdom Tree is a concentration app i am working on.

Wisdom Tree Wisdom Tree is a tui concentration app I am working on. Inspired by the wisdom tree in Plants vs. Zombies which gives in-game tips when it

NO ONE 241 Jan 1, 2023
A basic layout of atm working of my local database

Software for working Banking service ?? This project was developed for Banking service. mysql server is required To have mysql server on your system u

satya 1 Oct 21, 2021
A basic layout of atm working of my local database

Software for working Banking service ?? This project was developed for Banking service. mysql server is required To have mysql server on your system u

satya 1 Oct 21, 2021