Like Dirt-Samples, but cleaned up

Overview

Clean-Samples

Like Dirt-Samples, but cleaned up, with clear provenance and license info (generally a permissive creative commons licence but check the metadata for specifics).

The bin/meta.py python script is a reference implementation that can make a '.cleanmeta' metadata file for your own sample pack folder. See below for how to use it and contribute a sample pack of your own.

If you want to use these outside the Tidal/SuperDirt/SuperCollider ecosystem you are very welcome. You're encouraged to join discussion in the github issue tracker so that we can develop a standard way to share and index/signpost these packs.

See /tidalcycles/sounds-repetition for an example sample pack which has two sets of samples in it.

How to contribute a sample pack

Please only contribute samples if you are happy to share them under a permissive license such as CC0 or a similar creative commons license.

If you are unfamiliar with the 'git' software, please create an issue here, with a short description of your samples and a link to them and someone should be along to help shortly.

If you are familiar with git and running python scripts (or happy to learn), please follow the below instructions. This is all new - if anything is unclear please create an issue, thanks!

  1. Get your samples together in .wav format, editing them if necessary (see below for advice).

  2. Create a new repository. This isn't essential, but consider putting 'sounds-' in front of its name, e.g. 'sounds-303bass' for your 303 bass samples.

  3. Add your samples to the repository. For an example of how to organise them, see this sample pack: tidalcycles/sounds-repetition, which has two sets of samples, with a subfolder for each.

  4. Create a '.cleanmeta' metadata file for each subfolder. Again, see tidalcycles/sounds-repetition for examples. There is a python script bin/meta.py which can generate the metadata file for you, run it without parameters for help. Here is an example commandline, that was used to generate repetition.cleanmeta:

    ../Clean-Samples/bin/meta.py --maintainer alex --email [email protected] --copyright "(c) 2021 Alex McLean" --license CC0 --provenance "Various dodgy speech synths" --shortname repetition --sample-subfolder repetition/ --write .
    

    After generating the file, edit it with a text editor to fill in any missing info.

  5. When ready, add te URL of your repository to the https://github.com/tidalcycles/Clean-Samples/blob/main/Clean-Samples.quark for the Clean-Samples quark) in a pull request. You could also add it to the SuperCollider quarks database, or we can do that for you if you prefer, so that we can accept the PR to Clean-Samples once it's accepted as a quark.

Advice for preparing samples

You can use free/open source software like audacity for editing samples.

As a minimum, be sure to trim any silence from beginning/end of the samples, and that the start and end of the sample is at zero to avoid clicks (you might need to fade in / fade out by a tiny amount to achieve this).

Consider adjusting the volume/loudness too, for example normalising to -1.0db - but this is very subjective and will depend on the nature of the samples and the music they're used with. For example distorted gabba samples are intended to be very loud, and a whisper is intended to sound silent. The average non-percussive sample should be around -23dB RMS. Samples shouldn't exceed 0dB true peak. EBU recommends -1dBTP at 4x-oversampling. Samples generally shouldn't have DC offset, although e.g. some kick drum samples naturally have non-zero mean.

For more advice, you could join the discussion here.

Thanks!

Comments
  • How to organise sample sets

    How to organise sample sets

    Should we use submodules, so people can curate sample sets in their own users/orgs? https://git-scm.com/book/en/v2/Git-Tools-Submodules

    Or shall we make a monolithic set of samples here, with more centralised control to e.g. keep the metadata up-to-date?

    Or something in between - separate submodules, but all forked under the same organisation for maintenance?

    (longer term, the organisation shouldn't be tidalcycles, maybe toplap or some other umbrella)

    opened by yaxu 25
  • Quarks as sample sets

    Quarks as sample sets

    I'm a bit worried about polluting the supercollider quarks database with a lot of sample sets. What do you think @telephon ? Maybe at least a good idea to encourage people to prefix the name of their quarks with e.g. Samples- or Clean-?

    opened by yaxu 16
  • Reference metadata implementation

    Reference metadata implementation

    It would be nice to start with a python script that reads a folder of samples, and writes a metadata file for them.. As a practical tool and also a reference implementation for people wanting to use clean samples in their cool live coding language.

    opened by yaxu 5
  • Clean-cleary-samples - submodules removed

    Clean-cleary-samples - submodules removed

    Hi @yaxu, I've updated my sample set to

    • not use submodules
    • move the cleanmeta file into the subdir containing the samples

    Could you give it another update/try please?

    Also, I'm still working on the cello sets for the moment, so I've left them out - once I've got their layout confirmed I will move them in -

    [edit] Quick question, is the Clean- prefix still required? I'd rather not maintain a separate repository just for this - ideally I'd just like to add the orig sample repos to the quark deps

    opened by cleary 3
  • add bowed cello samples to quark

    add bowed cello samples to quark

    Hi @yaxu

    The cbow set is now complete, I would do a pull request, but it's a fair bit of mucking about for both of us for a one line change ;)

    Could you please add to the quark deps:

    https://github.com/cleary/samples-cello-bowed

    ? Thanks!

    opened by cleary 2
  • add Clean-cleary-samples (fyi includes git submodules)

    add Clean-cleary-samples (fyi includes git submodules)

    @yaxu first attempt - things of note:

    • the subdirs are git-submodules, not sure how superdirt handles that but let's find out
    • I haven't added my set to the quarks database, pending more info in this discussion

    https://github.com/cleary/Clean-cleary-samples

    opened by cleary 1
  • Add ukulele samples

    Add ukulele samples

    I added my ukulele samples to the Clean-Samples.quark dependencies.

    Therefore I added a quark and cleanmeta file to my sample repo under https://github.com/thgrund/samples-ukulele.

    opened by thgrund 0
  • add flbass as submodule (example)

    add flbass as submodule (example)

    I know the decision hasn't been made yet, but for the sake of testing here's a submodule example

    Adding a new a submodule:

    bernie@jobim:~/source/Clean-Samples$ git submodule add https://github.com/cleary/samples-flbass.git
    Cloning into '/home/bernie/source/Clean-Samples/samples-flbass'...
    remote: Enumerating objects: 68, done.
    remote: Counting objects: 100% (68/68), done.
    remote: Compressing objects: 100% (61/61), done.
    remote: Total 68 (delta 19), reused 47 (delta 7), pack-reused 0
    Unpacking objects: 100% (68/68), 10.41 MiB | 3.48 MiB/s, done.
    bernie@jobim:~/source/Clean-Samples$ git status
    On branch main
    Your branch is up to date with 'origin/main'.
    
    Changes to be committed:
      (use "git restore --staged <file>..." to unstage)
    	new file:   .gitmodules
    	new file:   samples-flbass
    
    bernie@jobim:~/source/Clean-Samples$ git commit -am 'add flbass as submodule'
    [main ceb0ed0] add flbass as submodule
     2 files changed, 4 insertions(+)
     create mode 100644 .gitmodules
     create mode 160000 samples-flbass
    bernie@jobim:~/source/Clean-Samples$ git push 
    Username for 'https://github.com': cleary
    Password for 'https://[email protected]': 
    Enumerating objects: 4, done.
    Counting objects: 100% (4/4), done.
    Delta compression using up to 8 threads
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 470 bytes | 470.00 KiB/s, done.
    Total 3 (delta 0), reused 0 (delta 0)
    To https://github.com/cleary/Clean-Samples.git
       d360302..ceb0ed0  main -> main
    bernie@jobim:~/source/Clean-Samples$
    

    Clone Command needs to change (and should be updated in the README:

    git clone --recurse-submodules https://github.com/tidalcycles/Clean-Samples.git
    

    To update all submodules in the repo, use:

    git submodule update --remote [--merge]
    

    To update a specific submodule in the repo, it's possible but I haven't done it myself explicitly: https://stackoverflow.com/a/45251405/3164018

    opened by cleary 0
  • How to represent short identifiers for sounds and sound sets

    How to represent short identifiers for sounds and sound sets

    Hi @yaxu, I was just revisiting the metadata for my various sample repos -

    When generating the cleanmeta file, there doesn't seem to be any reference to the --shortname value in the generated file. It does use this to create the filename I believe ie <shortname>.cleanmeta

    As a personal preference, I'd prefer that the shortname be independent of the cleanmeta filename (and not have any dependence on the filename beyond a .cleanmeta suffix) - so I can keep the quark and cleanmeta files named consistently (ie samples-flbass.[quark|cleanmeta], plus guarantee that I know the shortname will be referenced correctly by superdirt (rather than guessing that the cleanmeta filename prefix will be used)

    Hope this makes sense, had trouble articulating :/

    opened by cleary 4
  • metadata fields

    metadata fields

    I have a proof-of-concept node.js script that parses the clean-samples quark and then lets you select which repos you'd like to download. Having the download size of each repo would be nice to help users make informed choices about what they're grabbing.

    Which made me think that perhaps we should be adding more metadata in general, and perhaps most of this could be automatically added by the Python script so that it wouldn't be a burden on users adding sample banks. I would suggest as a possible starting point:

    1. Filesize
    2. Number of channels
    3. Sample Rate
    4. Bit depth
    5. Duration

    This might enable more selective download scripts in the future e.g. "get all 16-bit mono samples that are under .5 seconds in duration from the repos by yaxu ". Is there a reason not to add more metadata?

    opened by charlieroberts 14
  • Metadata location

    Metadata location

    .cleanmeta looks like an unused extension: https://github.com/search?q=extension%3Acleanmeta&type=Code

    but is there a better, more self-explanatory name we could use?

    opened by yaxu 8
  • Sample Quality, Normalization and Loudness

    Sample Quality, Normalization and Loudness

    Thanks for the effort so far!

    In ddbc883c324159e2591c4c580d487d68dc1152c1 the README states:

    We recommend normalising them to xxx dB

    I did some experiments with Dirt-Samples in the past and found that normalization is complicated. What comes to my mind is:

    • Some drum machine samples have accent and normal level sounds – don't break the dynamic.
    • If a loop is cut into slices – don't break the dynamic between slices.
    • Short percussive samples and long pad sounds do not sound right together when normalized by peak, RMS or even EBU-R128.

    So my suggestions to rephrase this, are:

    • Sample true-peak MUST NOT exceed 0dBTP. EBU recommends -1dBTP at 4x-oversampling.
    • Default sample loudness (not level) should mix musically well with audio program that is roughly according to EBU-R128. "Musically" means, those gabba samples are intended to be very loud; some whisper is intended to sound silent. The average non-percussive sample SHOULD be around -23dB RMS.

    Probably this could form a new section in the README on "Sample Quality". Also with:

    • Samples SHOULD not have DC-offset. Some kick-sounds natually have a non-zero mean, though.
    • Samples MAY be ready-to-use bandpass filtered. Consider that playback speed might be altered.

    What do you think?

    opened by jkbd 9
Owner
TidalCycles
Live coding environment for making patterns
TidalCycles
Like ThreeJS but for Python and based on wgpu

pygfx A render engine, inspired by ThreeJS, but for Python and targeting Vulkan/Metal/DX12 (via wgpu). Introduction This is a Python render engine bui

null 139 Jan 7, 2023
Opinionated code formatter, just like Python's black code formatter but for Beancount

beancount-black Opinionated code formatter, just like Python's black code formatter but for Beancount Try it out online here Features MIT licensed - b

Launch Platform 16 Oct 11, 2022
It's like Shape Editor in Maya but works with skeletons (transforms).

Skeleposer What is Skeleposer? Briefly, it's like Shape Editor in Maya, but works with transforms and joints. It can be used to make complex facial ri

Alexander Zagoruyko 1 Nov 11, 2022
Code samples for my book "Neural Networks and Deep Learning"

Code samples for "Neural Networks and Deep Learning" This repository contains code samples for my book on "Neural Networks and Deep Learning". The cod

Michael Nielsen 13.9k Dec 26, 2022
PAWS 🐾 Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples.

Facebook Research 437 Dec 23, 2022
Jupyter notebooks for the code samples of the book "Deep Learning with Python"

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

François Chollet 16.2k Dec 30, 2022
Learn about Spice.ai with in-depth samples

Samples Learn about Spice.ai with in-depth samples ServerOps - Learn when to run server maintainance during periods of low load Gardener - Intelligent

Spice.ai 16 Mar 23, 2022
Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples / ICLR 2018

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples This project is for the paper "Training Confidence-Calibrated Clas

null 168 Nov 29, 2022
Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Linux_kernel_exploits Some Linux kernel exploits for various real world kernel vulnerabilities here. More exploits are yet to come. This repo contains

Wei Wu 472 Dec 21, 2022
NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

[Official] FINE Samples for Learning with Noisy Labels This repository is the official implementation of "FINE Samples for Learning with Noisy Labels"

mythbuster 27 Dec 23, 2022
Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

Kanghyun Choi 21 Nov 3, 2022
The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

Booz Allen Hamilton 112 Dec 13, 2022
Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

Jennefer Maldonado 1 Dec 28, 2021
Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

Jesse Bloom 4 Feb 9, 2022
Simple, but essential Bayesian optimization package

BayesO: A Bayesian optimization framework in Python Simple, but essential Bayesian optimization package. http://bayeso.org Online documentation Instal

Jungtaek Kim 74 Dec 5, 2022
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022