Like Dirt-Samples, but cleaned up

TidalCycles

Last update: Nov 30, 2022

Related tags

Deep Learning Clean-Samples

Overview

Clean-Samples

Like Dirt-Samples, but cleaned up, with clear provenance and license info (generally a permissive creative commons licence but check the metadata for specifics).

The bin/meta.py python script is a reference implementation that can make a '.cleanmeta' metadata file for your own sample pack folder. See below for how to use it and contribute a sample pack of your own.

If you want to use these outside the Tidal/SuperDirt/SuperCollider ecosystem you are very welcome. You're encouraged to join discussion in the github issue tracker so that we can develop a standard way to share and index/signpost these packs.

See /tidalcycles/sounds-repetition for an example sample pack which has two sets of samples in it.

How to contribute a sample pack

Please only contribute samples if you are happy to share them under a permissive license such as CC0 or a similar creative commons license.

If you are unfamiliar with the 'git' software, please create an issue here, with a short description of your samples and a link to them and someone should be along to help shortly.

If you are familiar with git and running python scripts (or happy to learn), please follow the below instructions. This is all new - if anything is unclear please create an issue, thanks!

Get your samples together in .wav format, editing them if necessary (see below for advice).
Create a new repository. This isn't essential, but consider putting 'sounds-' in front of its name, e.g. 'sounds-303bass' for your 303 bass samples.
Add your samples to the repository. For an example of how to organise them, see this sample pack: tidalcycles/sounds-repetition, which has two sets of samples, with a subfolder for each.
Create a '.cleanmeta' metadata file for each subfolder. Again, see tidalcycles/sounds-repetition for examples. There is a python script bin/meta.py which can generate the metadata file for you, run it without parameters for help. Here is an example commandline, that was used to generate repetition.cleanmeta:
```
../Clean-Samples/bin/meta.py --maintainer alex --email [email protected] --copyright "(c) 2021 Alex McLean" --license CC0 --provenance "Various dodgy speech synths" --shortname repetition --sample-subfolder repetition/ --write .
```
After generating the file, edit it with a text editor to fill in any missing info.
When ready, add te URL of your repository to the https://github.com/tidalcycles/Clean-Samples/blob/main/Clean-Samples.quark for the Clean-Samples quark) in a pull request. You could also add it to the SuperCollider quarks database, or we can do that for you if you prefer, so that we can accept the PR to Clean-Samples once it's accepted as a quark.

Advice for preparing samples

You can use free/open source software like audacity for editing samples.

As a minimum, be sure to trim any silence from beginning/end of the samples, and that the start and end of the sample is at zero to avoid clicks (you might need to fade in / fade out by a tiny amount to achieve this).

Consider adjusting the volume/loudness too, for example normalising to -1.0db - but this is very subjective and will depend on the nature of the samples and the music they're used with. For example distorted gabba samples are intended to be very loud, and a whisper is intended to sound silent. The average non-percussive sample should be around -23dB RMS. Samples shouldn't exceed 0dB true peak. EBU recommends -1dBTP at 4x-oversampling. Samples generally shouldn't have DC offset, although e.g. some kick drum samples naturally have non-zero mean.

For more advice, you could join the discussion here.

Thanks!

Comments

How to organise sample sets

Should we use submodules, so people can curate sample sets in their own users/orgs? https://git-scm.com/book/en/v2/Git-Tools-Submodules

Or shall we make a monolithic set of samples here, with more centralised control to e.g. keep the metadata up-to-date?

Or something in between - separate submodules, but all forked under the same organisation for maintenance?

(longer term, the organisation shouldn't be tidalcycles, maybe toplap or some other umbrella)

opened by yaxu 25
Quarks as sample sets

I'm a bit worried about polluting the supercollider quarks database with a lot of sample sets. What do you think @telephon ? Maybe at least a good idea to encourage people to prefix the name of their quarks with e.g. Samples- or Clean-?

opened by yaxu 16
Reference metadata implementation

It would be nice to start with a python script that reads a folder of samples, and writes a metadata file for them.. As a practical tool and also a reference implementation for people wanting to use clean samples in their cool live coding language.

opened by yaxu 5
Clean-cleary-samples - submodules removed
Hi @yaxu, I've updated my sample set to

not use submodules

move the cleanmeta file into the subdir containing the samples

Could you give it another update/try please?

Also, I'm still working on the cello sets for the moment, so I've left them out - once I've got their layout confirmed I will move them in -

[edit] Quick question, is the Clean- prefix still required? I'd rather not maintain a separate repository just for this - ideally I'd just like to add the orig sample repos to the quark deps
opened by cleary 3
add bowed cello samples to quark

Hi @yaxu

The cbow set is now complete, I would do a pull request, but it's a fair bit of mucking about for both of us for a one line change ;)

Could you please add to the quark deps:

https://github.com/cleary/samples-cello-bowed

? Thanks!

opened by cleary 2
add Clean-cleary-samples (fyi includes git submodules)
@yaxu first attempt - things of note:

the subdirs are git-submodules, not sure how superdirt handles that but let's find out

I haven't added my set to the quarks database, pending more info in this discussion

https://github.com/cleary/Clean-cleary-samples
opened by cleary 1
Add ukulele samples

I added my ukulele samples to the Clean-Samples.quark dependencies.

Therefore I added a quark and cleanmeta file to my sample repo under https://github.com/thgrund/samples-ukulele.

opened by thgrund 0

add flbass as submodule (example)

I know the decision hasn't been made yet, but for the sake of testing here's a submodule example

Adding a new a submodule:

bernie@jobim:~/source/Clean-Samples$ git submodule add https://github.com/cleary/samples-flbass.git
Cloning into '/home/bernie/source/Clean-Samples/samples-flbass'...
remote: Enumerating objects: 68, done.
remote: Counting objects: 100% (68/68), done.
remote: Compressing objects: 100% (61/61), done.
remote: Total 68 (delta 19), reused 47 (delta 7), pack-reused 0
Unpacking objects: 100% (68/68), 10.41 MiB | 3.48 MiB/s, done.
bernie@jobim:~/source/Clean-Samples$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   .gitmodules
	new file:   samples-flbass

bernie@jobim:~/source/Clean-Samples$ git commit -am 'add flbass as submodule'
[main ceb0ed0] add flbass as submodule
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 samples-flbass
bernie@jobim:~/source/Clean-Samples$ git push 
Username for 'https://github.com': cleary
Password for 'https://[email protected]': 
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 470 bytes | 470.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/cleary/Clean-Samples.git
   d360302..ceb0ed0  main -> main
bernie@jobim:~/source/Clean-Samples$

Clone Command needs to change (and should be updated in the README:

git clone --recurse-submodules https://github.com/tidalcycles/Clean-Samples.git

To update all submodules in the repo, use:

git submodule update --remote [--merge]

To update a specific submodule in the repo, it's possible but I haven't done it myself explicitly: https://stackoverflow.com/a/45251405/3164018

opened by cleary 0

How to represent short identifiers for sounds and sound sets

Hi @yaxu, I was just revisiting the metadata for my various sample repos -

When generating the cleanmeta file, there doesn't seem to be any reference to the --shortname value in the generated file. It does use this to create the filename I believe ie <shortname>.cleanmeta

As a personal preference, I'd prefer that the shortname be independent of the cleanmeta filename (and not have any dependence on the filename beyond a .cleanmeta suffix) - so I can keep the quark and cleanmeta files named consistently (ie samples-flbass.[quark|cleanmeta], plus guarantee that I know the shortname will be referenced correctly by superdirt (rather than guessing that the cleanmeta filename prefix will be used)

Hope this makes sense, had trouble articulating :/

opened by cleary 4
metadata fields
I have a proof-of-concept node.js script that parses the clean-samples quark and then lets you select which repos you'd like to download. Having the download size of each repo would be nice to help users make informed choices about what they're grabbing.

Which made me think that perhaps we should be adding more metadata in general, and perhaps most of this could be automatically added by the Python script so that it wouldn't be a burden on users adding sample banks. I would suggest as a possible starting point:

Filesize

Number of channels

Sample Rate

Bit depth

Duration

This might enable more selective download scripts in the future e.g. "get all 16-bit mono samples that are under .5 seconds in duration from the repos by yaxu ". Is there a reason not to add more metadata?
opened by charlieroberts 14
Metadata location

.cleanmeta looks like an unused extension: https://github.com/search?q=extension%3Acleanmeta&type=Code

but is there a better, more self-explanatory name we could use?

opened by yaxu 8
Sample Quality, Normalization and Loudness
Thanks for the effort so far!

In ddbc883c324159e2591c4c580d487d68dc1152c1 the README states:

We recommend normalising them to xxx dB

I did some experiments with Dirt-Samples in the past and found that normalization is complicated. What comes to my mind is:

Some drum machine samples have accent and normal level sounds – don't break the dynamic.

If a loop is cut into slices – don't break the dynamic between slices.

Short percussive samples and long pad sounds do not sound right together when normalized by peak, RMS or even EBU-R128.

So my suggestions to rephrase this, are:

Sample true-peak MUST NOT exceed 0dBTP. EBU recommends -1dBTP at 4x-oversampling.

Default sample loudness (not level) should mix musically well with audio program that is roughly according to EBU-R128. "Musically" means, those gabba samples are intended to be very loud; some whisper is intended to sound silent. The average non-percussive sample SHOULD be around -23dB RMS.

Probably this could form a new section in the README on "Sample Quality". Also with:

Samples SHOULD not have DC-offset. Some kick-sounds natually have a non-zero mean, though.

Samples MAY be ready-to-use bandpass filtered. Consider that playback speed might be altered.

What do you think?
opened by jkbd 9

Owner

TidalCycles

Live coding environment for making patterns

GitHub

Like ThreeJS but for Python and based on wgpu

pygfx A render engine, inspired by ThreeJS, but for Python and targeting Vulkan/Metal/DX12 (via wgpu). Introduction This is a Python render engine bui

139 Jan 7, 2023

Opinionated code formatter, just like Python's black code formatter but for Beancount

beancount-black Opinionated code formatter, just like Python's black code formatter but for Beancount Try it out online here Features MIT licensed - b

16 Oct 11, 2022

It's like Shape Editor in Maya but works with skeletons (transforms).

Skeleposer What is Skeleposer? Briefly, it's like Shape Editor in Maya, but works with transforms and joints. It can be used to make complex facial ri

1 Nov 11, 2022

Code samples for my book "Neural Networks and Deep Learning"

Code samples for "Neural Networks and Deep Learning" This repository contains code samples for my book on "Neural Networks and Deep Learning". The cod

13.9k Dec 26, 2022

PAWS 🐾 Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples.

437 Dec 23, 2022

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

16.2k Dec 30, 2022

Learn about Spice.ai with in-depth samples

Samples Learn about Spice.ai with in-depth samples ServerOps - Learn when to run server maintainance during periods of low load Gardener - Intelligent

16 Mar 23, 2022

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples / ICLR 2018

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples This project is for the paper "Training Confidence-Calibrated Clas

168 Nov 29, 2022

Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Linux_kernel_exploits Some Linux kernel exploits for various real world kernel vulnerabilities here. More exploits are yet to come. This repo contains

472 Dec 21, 2022

NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

[Official] FINE Samples for Learning with Noisy Labels This repository is the official implementation of "FINE Samples for Learning with Noisy Labels"

27 Dec 23, 2022

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

21 Nov 3, 2022

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

112 Dec 13, 2022

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

1 Dec 28, 2021

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

4 Feb 9, 2022

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

Simple, but essential Bayesian optimization package

BayesO: A Bayesian optimization framework in Python Simple, but essential Bayesian optimization package. http://bayeso.org Online documentation Instal

74 Dec 5, 2022

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

81 Nov 26, 2022

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

64 Dec 12, 2022