Skip to content

paracrawl/corset

Repository files navigation

Corset

Logo

License

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data. So, if you don't need the whole corpus, but just a suitable subset (indeed, a cor(pus sub)set, this is what Corset will do for you--and the reason of the name of the tool.

Here are some highlights of what you will find in Corset:

Millions of parallel sentences to explore

  • Dive into parallel corpora performing searches at the speed of light.
  • Search in either source or target sides of corpora.
  • Keep track of your preferred searches and their details.

Tailored corpora (corsets) from big corpora

  • Get smaller and custom corpora that fit your sample text.
  • Set up the details of your corset (name, topic, languages, size) and launch your search over millions of parallel sentences.

Monitor and download corsets

  • See the status of your corsets, preview them, download them, remove or share them!
  • Take a look to shared corsets to see if they are already tailored to your needs.

To know more, see:


Connecting Europe Facility

All documents and software contained in this repository reflect only the authors' view. The Innovation and Networks Executive Agency of the European Union is not responsible for any use that may be made of the information it contains.

About

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published