Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

Related tags

corset
Overview

Corset

Logo

License

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data. So, if you don't need the whole corpus, but just a suitable subset (indeed, a cor(pus sub)set, this is what Corset will do for you--and the reason of the name of the tool.

Here are some highlights of what you will find in Corset:

Millions of parallel sentences to explore

  • Dive into parallel corpora performing searches at the speed of light.
  • Search in either source or target sides of corpora.
  • Keep track of your preferred searches and their details.

Tailored corpora (corsets) from big corpora

  • Get smaller and custom corpora that fit your sample text.
  • Set up the details of your corset (name, topic, languages, size) and launch your search over millions of parallel sentences.

Monitor and download corsets

  • See the status of your corsets, preview them, download them, remove or share them!
  • Take a look to shared corsets to see if they are already tailored to your needs.

To know more, see:


Connecting Europe Facility

All documents and software contained in this repository reflect only the authors' view. The Innovation and Networks Executive Agency of the European Union is not responsible for any use that may be made of the information it contains.

Issues
  • Create LICENSE

    Create LICENSE

    opened by mbanon 0
  • Improvements in search view, sidebar

    Improvements in search view, sidebar

    • Refreshes preview when switching between base corpora and target language
    • Adds link to User Guide and project repo in the sidebar
    • Minor UI fixes
    opened by sjarmero 0
  • Fix issues

    Fix issues

    • Fix escape issues
    • Fix history delete id cast
    opened by sjarmero 0
  • Do not escape top corsets

    Do not escape top corsets

    opened by sjarmero 0
  • Bump urllib3 from 1.25.8 to 1.26.5 in /front/scripts

    Bump urllib3 from 1.25.8 to 1.26.5 in /front/scripts

    Bumps urllib3 from 1.25.8 to 1.26.5.

    Release notes

    Sourced from urllib3's releases.

    1.26.5

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.

    If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

    1.26.4

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Changed behavior of the default SSLContext when connecting to HTTPS proxy during HTTPS requests. The default SSLContext now sets check_hostname=True.

    If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

    1.26.3

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed bytes and string comparison issue with headers (Pull #2141)

    • Changed ProxySchemeUnknown error message to be more actionable if the user supplies a proxy URL without a scheme (Pull #2107)

    If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

    1.26.2

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed an issue where wrap_socket and CERT_REQUIRED wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)

    1.26.1

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed an issue where two User-Agent headers would be sent if a User-Agent header key is passed as bytes (Pull #2047)

    1.26.0

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)

    • Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning should opt-in explicitly by setting ssl_version=ssl.PROTOCOL_TLSv1_1 (Pull #2002) Starting in urllib3 v2.0: Connections that receive a DeprecationWarning will fail

    • Deprecated Retry options Retry.DEFAULT_METHOD_WHITELIST, Retry.DEFAULT_REDIRECT_HEADERS_BLACKLIST and Retry(method_whitelist=...) in favor of Retry.DEFAULT_ALLOWED_METHODS, Retry.DEFAULT_REMOVE_HEADERS_ON_REDIRECT, and Retry(allowed_methods=...) (Pull #2000) Starting in urllib3 v2.0: Deprecated options will be removed

    ... (truncated)

    Changelog

    Sourced from urllib3's changelog.

    1.26.5 (2021-05-26)

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.

    1.26.4 (2021-03-15)

    • Changed behavior of the default SSLContext when connecting to HTTPS proxy during HTTPS requests. The default SSLContext now sets check_hostname=True.

    1.26.3 (2021-01-26)

    • Fixed bytes and string comparison issue with headers (Pull #2141)

    • Changed ProxySchemeUnknown error message to be more actionable if the user supplies a proxy URL without a scheme. (Pull #2107)

    1.26.2 (2020-11-12)

    • Fixed an issue where wrap_socket and CERT_REQUIRED wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)

    1.26.1 (2020-11-11)

    • Fixed an issue where two User-Agent headers would be sent if a User-Agent header key is passed as bytes (Pull #2047)

    1.26.0 (2020-11-10)

    • NOTE: urllib3 v2.0 will drop support for Python 2. Read more in the v2.0 Roadmap <https://urllib3.readthedocs.io/en/latest/v2-roadmap.html>_.

    • Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)

    • Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning

    ... (truncated)

    Commits
    • d161647 Release 1.26.5
    • 2d4a3fe Improve performance of sub-authority splitting in URL
    • 2698537 Update vendored six to 1.16.0
    • 07bed79 Fix deprecation warnings for Python 3.10 ssl module
    • d725a9b Add Python 3.10 to GitHub Actions
    • 339ad34 Use pytest==6.2.4 on Python 3.10+
    • f271c9c Apply latest Black formatting
    • 1884878 [1.26] Properly proxy EOF on the SSLTransport test suite
    • a891304 Release 1.26.4
    • 8d65ea1 Merge pull request from GHSA-5phf-pp7p-vc2r
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump urllib3 from 1.26.4 to 1.26.5 in /back/scripts

    Bump urllib3 from 1.26.4 to 1.26.5 in /back/scripts

    Bumps urllib3 from 1.26.4 to 1.26.5.

    Release notes

    Sourced from urllib3's releases.

    1.26.5

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.

    If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

    Changelog

    Sourced from urllib3's changelog.

    1.26.5 (2021-05-26)

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.
    Commits
    • d161647 Release 1.26.5
    • 2d4a3fe Improve performance of sub-authority splitting in URL
    • 2698537 Update vendored six to 1.16.0
    • 07bed79 Fix deprecation warnings for Python 3.10 ssl module
    • d725a9b Add Python 3.10 to GitHub Actions
    • 339ad34 Use pytest==6.2.4 on Python 3.10+
    • f271c9c Apply latest Black formatting
    • 1884878 [1.26] Properly proxy EOF on the SSLTransport test suite
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Source languages

    Source languages

    • Add custom source languages
    • Display active and inactive languages in admin panel
    • Bug fixes
    opened by sjarmero 0
Protocol Buffers - Google's data interchange format

Protocol Buffers - Google's data interchange format Copyright 2008 Google Inc. https://developers.google.com/protocol-buffers/ Overview Protocol Buffe

Protocol Buffers 50.8k Sep 23, 2021
Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

jsonpickle jsonpickle is a library for the two-way conversion of complex Python objects and JSON. jsonpickle builds upon the existing JSON encoders, s

null 962 Sep 16, 2021
Python wrapper around rapidjson

python-rapidjson Python wrapper around RapidJSON Authors: Ken Robbins <[email protected]> Lele Gaifax <[email protected]> License: MIT License Sta

null 429 Sep 17, 2021
A lightweight library for converting complex objects to and from simple Python datatypes.

marshmallow: simplified object serialization marshmallow is an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, t

marshmallow-code 5.7k Sep 24, 2021
MessagePack serializer implementation for Python msgpack.org[Python]

MessagePack for Python What's this MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JS

MessagePack 1.5k Sep 22, 2021
Generic ASN.1 library for Python

ASN.1 library for Python This is a free and open source implementation of ASN.1 types and codecs as a Python package. It has been first written to sup

Ilya Etingof 187 Sep 20, 2021
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

orjson orjson is a fast, correct JSON library for Python. It benchmarks as the fastest Python library for JSON and is more correct than the standard j

null 2.4k Sep 22, 2021
Extended pickling support for Python objects

cloudpickle cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library.

null 1k Sep 23, 2021
FlatBuffers: Memory Efficient Serialization Library

FlatBuffers FlatBuffers is a cross platform serialization library architected for maximum memory efficiency. It allows you to directly access serializ

Google 16.8k Sep 23, 2021
serialize all of python

dill serialize all of python About Dill dill extends python's pickle module for serializing and de-serializing python objects to the majority of the b

The UQ Foundation 1.5k Sep 23, 2021
Ultra fast JSON decoder and encoder written in C with Python bindings

UltraJSON UltraJSON is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 3.6+. Install with pip: $ python -m pip insta

null 3.4k Sep 24, 2021
Crappy tool to convert .scw files to .json and and vice versa.

SCW-JSON-TOOL Crappy tool to convert .scw files to .json and vice versa. How to use Run main.py file with two arguments: python main.py <scw2json or j

Fred31 5 May 14, 2021
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON <http://json.org> encoder and decoder for Python 3.3+ with legacy suppo

null 1.4k Sep 24, 2021
Python bindings for the simdjson project.

pysimdjson Python bindings for the simdjson project, a SIMD-accelerated JSON parser. If SIMD instructions are unavailable a fallback parser is used, m

Tyler Kennedy 474 Sep 24, 2021