Easy genetic ancestry predictions in Python

Kevin Arvai

Last update: Jan 2, 2023

Related tags

Deep Learning genomics data-visualization ancestry dimensionality-reduction genomics-visualization personal-genomics genotypes streamlit

Overview

ezancestry

Easily visualize your direct-to-consumer genetics next to 2500+ samples from the 1000 genomes project. Evaluate the performance of a custom set of ancestry-informative snps (AISNPs) at classifying the genetic ancestry of the 1000 genomes samples using a machine learning model.

A subset of 1000 Genomes Project samples' single nucleotide polymorphism(s), or, SNP(s) have been parsed from the publicly available .bcf files.
The subset of SNPs, AISNPs (ancestry-informative snps), were chosen from two publications:

Set of 55 AISNPs. Progress toward an efficient panel of SNPs for ancestry inference. Kidd et al. 2014
Set of 128 AISNPs. Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America.. Kosoy et al. 2009 (Seldin Lab)

ezancestry ships with pretrained k-nearest neighbor models for all combinations of following:

* Kidd (55 AISNPs)
* Seldin (128 AISNPs)

* continental-level population (superpopulation)
* regional population (population)

* principal componentanalysis (PCA)
* neighborhood component analysis (NCA)
* uniform manifold approximation and projection (UMAP)

Installation
Config
Usage
- command line tool
- Python API
Visualization
Contributing

Installation

Install ezancestry with pip:

pip install ezancestry

Or clone the repository and run pip install from the directory:

git clone [email protected]:arvkevi/ezancestry.git
cd ezancestry
pip install .

Config

The first time ezancestry is run it will generate a config.ini file and data/ directory in your home directory under ${HOME}/.ezancestry. You can edit conf.ini to change the default settings, but it is not necessary to use ezancestry. The settings are just a utility for the user so they don't have to be verbose when interacting with the software. The settings are also keyword arguments to each of the commands in the ezancestry API, so you can always override the default settings.

These will be created in your home directory:

${HOME}/.ezancestry/conf.ini
${HOME}/.ezancestry/data/

Explanations of each setting is described in the Options section of the --help of each command, for example:

ezancestry predict --help

Usage: ezancestry predict [OPTIONS] INPUT_DATA

  Predict ancestry from genetic data.

  * Default arguments are from the ~/.ezancestry/conf.ini file. *

Arguments:
  INPUT_DATA  Can be a file path to raw genetic data (23andMe, ancestry.com,
              .vcf) file, a path to a directory containing several raw genetic
              files, or a (tab or comma) delimited file with sample ids as
              rows and snps as columns.  [required]


Options:
  --output-directory TEXT         The directory where to write the prediction
                                  results file

  --write-predictions / --no-write-predictions
                                  If True, write the predictions to a file. If
                                  False, return the predictions as a
                                  dataframe.  [default: True]

  --models-directory TEXT         The path to the directory where the model
                                  files are located.

  --aisnps-directory TEXT         The path to the directory where the AISNPs
                                  files are located.

  --n-components INTEGER          The number of components to use in the PCA
                                  dimensionality reduction.

  --k INTEGER                     The number of nearest neighbors to use in
                                  the KNN model.

  --thousand-genomes-directory TEXT
                                  The path to the 1000 genomes directory.
  --samples-directory TEXT        The path to the directory containing the
                                  samples.

  --algorithm TEXT                The dimensionality reduction algorithm to
                                  use. Choose pca|umap|nca

  --aisnps-set TEXT               The name of the AISNP set to use. To start,
                                  choose either 'Kidd' or 'Seldin'. The
                                  default value in conf.ini is 'Kidd'. *If
                                  using your AISNP set, this value will be the
                                  in the namingc onvention for all the new
                                  model files that are created*

  --help                          Show this message and exit.

Usage

ezancestry can be used as a command-line tool or as a Python library. ezancestry predict comes with pre-trained models when --aisnps-set="Kidd" (default) or --aisnps-set="Seldin".

build-model and generate-dependencies are for advanced users -- they download large amounts of data and build a new model from a custom AISNPs file.

command-line interface

There are four commands available:

predict: predict the genetic ancestry of a sample or cohort of samples using the nearest neighbors model.
plot: plot the genetic ancestry of samples using only the output of predict.
generate-dependencies: generate the dependencies for build-model.
build-model: build a nearest neighbors model from the 1000 genomes data using a custom set of AISNPs. Requires: generate-dependencies to be run first.

Use the commands in the following way:

predict

ezancestry can predict the genetic ancestry of a sample or cohort of samples using the nearest neighbors model. The input_data can be a file path to raw genetic data (23andMe, ancestry.com, .vcf) file, a path to a directory containing several raw genetic files, or a (tab or comma) delimited file with sample ids as rows and snps as columns.

This writes a file, predictions.csv to the output_directory (defaults to current directory). This file contains predicted ancestry for each sample.

Direct-to-consumer genetic data file (23andMe, ancestry.com, etc.):

ezancestry predict mygenome.txt

Directory of direct-to-consumer genetic data files or .vcf files:

ezancestry predict /path/to/genetic_datafiles

comma-separated file with sample ids as rows and snps as columns, filled with genotypes as values

ezancestry predict ${HOME}/.ezancestry/data/aisnps/thousand_genomes.KIDD.dataframe.csv

plot

Visualize the output of predict using the plot command. This will open a 3d scatter plot in a browser.

ezancestry plot predictions.csv

generate-dependencies

This command will download all of the data required to build a new nearest neighbors model for a custom set of AISNPs. This command will attempt to download all the .bcf files from The 1000 Genomes Project. If you want to use existing models, see predict and plot.

Without any arguments this command will download all necessary data to build new models and put it in the ${HOME}/.ezancestry/data/ directory.

ezancestry generate-dependencies

Now you are ready to build a new model with build-model.

build-model

Test the discriminative power of your custom set of AISNPs.

This command will build all the necessary models to visualize and predict the 1000 genomes samples as well as user-uploaded samples. A model performace evaluation report will be generated for a five-fold cross-validation on the training set of the 1000 genomes samples as well as a report for the holdout set.

Create a custom AISNP file here: ~/.ezancestry/data/aisnps/custom.AISNP.txt. The prefix of the filename, custom, can be whatever you want. Note that this value is used as the aisnps-set keyword argument for other ezancestry commands.

The file should look like this:

id      chromosome      position_hg19
rs731257        7       12669251
rs2946788       11      24010530
rs3793451       9       71659280
rs10236187      7       139447377
rs1569175       2       201021954

ezancestry build-model --aisnps-set=custom

Python API

See the notebook

Visualization

http://ezancestry.herokuapp.com/

Open in Streamlit

Contributing

Contributions are welcome! Please feel free to create an issue for discussion or make a pull request.

Comments

Dependency issues installing with pip

There seems to be a dependency issue that occurs during processing after installing ezancestry with pip, related to cyvcf2. See here. After installing ezancestry and downgrading to cyvcf2==0.30.14, the issue is resolved.

opened by apriha 2
Bump nbconvert from 6.1.0 to 6.3.0
Bumps nbconvert from 6.1.0 to 6.3.0.

Commits

cefe0bf Release 6.3.0

a534fb9 Release 6.3.0b0

87920c5 Add changelog for 6.3.0 (#1669)

dd6d9c7 add slide numbering (#1654)

5d2c5e2 Update state filter (#1664)

11ea593 fix: avoid closing the script tag early by escaping a forward slash (#1665)

968c5fb Fix HTML templates mentioned in help docs (#1653)

35c4d07 Add a new output filter that excludes widgets if there is no state (#1643)

c663c75 6.2.0

fd1dd15 6.2.0rc2

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump notebook from 6.4.3 to 6.4.10
Bumps notebook from 6.4.3 to 6.4.10.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump pillow from 8.3.2 to 9.0.0
Bumps pillow from 8.3.2 to 9.0.0.

Release notes

Sourced from pillow's releases.

9.0.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

Changes

Restrict builtins for ImageMath.eval() #5923 [@radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [@radarhere]

Fixed ImagePath.Path array handling #5920 [@radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [@radarhere]

Removed redundant part of condition #5915 [@radarhere]

Explicitly enable strip chopping for large uncompressed TIFFs #5517 [@kmilos]

Use the Windows method to get TCL functions on Cygwin #5807 [@DWesl]

Changed error type to allow for incremental WebP parsing #5404 [@radarhere]

Improved I;16 operations on big endian #5901 [@radarhere]

Ensure that BMP pixel data offset does not ignore palette #5899 [@radarhere]

Limit quantized palette to number of colors #5879 [@radarhere]

Use latin1 encoding to decode bytes #5870 [@radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [@radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [@radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [@radarhere]

Added rounding when converting P and PA #5824 [@radarhere]

Improved putdata() documentation and data handling #5910 [@radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [@radarhere]

Image.NONE is only used for resampling and dithers #5908 [@radarhere]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [@radarhere]

Add Tidelift alignment action and badge #5763 [@aclark4life]

Replaced further direct invocations of setup.py #5906 [@radarhere]

Added ImageShow support for xdg-open #5897 [@m-shinder]

Fixed typo #5902 [@radarhere]

Switched from deprecated "setup.py install" to "pip install ." #5896 [@radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [@cmbruns]

Fixed raising OSError in _safe_read when size is greater than SAFEBLOCK #5872 [@radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [@radarhere]

WebP: Fix memory leak during decoding on failure #5798 [@ilai-deutel]

Do not prematurely return in ImageFile when saving to stdout #5665 [@infmagic2047]

Added support for top right and bottom right TGA orientations #5829 [@radarhere]

Corrected ICNS file length in header #5845 [@radarhere]

Block tile TIFF tags when saving #5839 [@radarhere]

Added line width argument to ImageDraw polygon #5694 [@radarhere]

Do not redeclare class each time when converting to NumPy #5844 [@radarhere]

Only prevent repeated polygon pixels when drawing with transparency #5835 [@radarhere]

Fix pushes_fd method signature #5833 [@hoodmane]

Add support for pickling TrueType fonts #5826 [@hugovk]

Only prefer command line tools SDK on macOS over default MacOSX SDK #5828 [@radarhere]

Fix compilation on 64-bit Termux #5793 [@landfillbaby]

Replace 'setup.py sdist' with '-m build --sdist' #5785 [@hugovk]

Use declarative package configuration #5784 [@hugovk]

Use title for display in ImageShow #5788 [@radarhere]

Fix for PyQt6 #5775 [@hugovk]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.0.0 (2022-01-02)

Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

Improved I;16 operations on big endian #5901 [radarhere]

Limit quantized palette to number of colors #5879 [radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

Added rounding when converting P and PA #5824 [radarhere]

Improved putdata() documentation and data handling #5910 [radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

Added ImageShow support for xdg-open #5897 [m-shinder, radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [cmbruns, radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [radarhere]

... (truncated)

Commits

82541b6 9.0.0 version bump

cae5ac4 Merge pull request #5924 from radarhere/cves

ed4cf78 CVEs TBD

d7f60d1 Merge pull request #5923 from radarhere/imagemath_eval

8531b01 Restrict builtins for ImageMath.eval

1efb1d9 Merge pull request #5922 from radarhere/releasenotes

f6c7871 Added release notes for #5919, #5920 and #5921

032d2dc Update CHANGES.rst [ci skip]

baae9ec Merge pull request #5921 from radarhere/jpeg_eoi

1059eb5 If appended EOI did not work, do not keep trying

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump certifi from 2021.5.30 to 2022.12.7
Bumps certifi from 2021.5.30 to 2022.12.7.

Commits

9e9e840 2022.12.07

b81bdb2 2022.09.24

939a28f 2022.09.14

aca828a 2022.06.15.2

de0eae1 Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...

b8eb5e9 2022.06.15.1

47fb7ab Fix deprecation warning on Python 3.11 (#199)

b0b48e0 fixes #198 -- update link in license

9d514b4 2022.06.15

4151e88 Add py.typed to MANIFEST.in to package in sdist (#196)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump protobuf from 3.17.3 to 3.18.3
Bumps protobuf from 3.17.3 to 3.18.3.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.18.3

C++

Reduce memory consumption of MessageSet parsing

This release addresses a Security Advisory for C++ and Python users

Protocol Buffers v3.18.2

Java

Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.1

Python

Update setup.py to reflect that we now require at least Python 3.5 (#8989)

Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

Ruby

Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

Protocol Buffers v3.18.0

C++

Fix warnings raised by clang 11 (#8664)

Make StringPiece constructible from std::string_view (#8707)

Add missing capability attributes for LLVM 12 (#8714)

Stop using std::iterator (deprecated in C++17). (#8741)

Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)

Fix #7047 Safely handle setlocale (#8735)

Remove deprecated version of SetTotalBytesLimit() (#8794)

Support arena allocation of google::protobuf::AnyMetadata (#8758)

Fix undefined symbol error around SharedCtor() (#8827)

Fix default value of enum(int) in json_util with proto2 (#8835)

Better Smaller ByteSizeLong

Introduce event filters for inject_field_listener_events

Reduce memory usage of DescriptorPool

For lazy fields copy serialized form when allowed.

Re-introduce the InlinedStringField class

v2 access listener

Reduce padding in the proto's ExtensionRegistry map.

GetExtension performance optimizations

Make tracker a static variable rather than call static functions

Support extensions in field access listener

Annotate MergeFrom for field access listener

Fix incomplete types for field access listener

Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.

Reduce binary size due to fieldless proto messages

TextFormat: ParseInfoTree supports getting field end location in addition to start.

Fix repeated enum extension size in field listener

Enable Any Text Expansion for Descriptors::DebugString()

Switch from int{8,16,32,64} to int{8,16,32,64}_t

... (truncated)

Commits

a902b39 No-op whitespace change

ae62acd Updating version.json and repo version numbers to: 18.3

f43ac49 Merge pull request #10542 from deannagarcia/3.18.x

9efdf55 Add missing includes

d1635e1 Apply patch

5b37c91 Update version.json with "lts": true (#10534)

c39d622 Merge pull request #10529 from protocolbuffers/deannagarcia-patch-5

f77d3b6 Update version.json

8178b06 Merge pull request #10503 from deannagarcia/3.18.x

24ca839 Add version file

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump nbconvert from 6.1.0 to 6.5.1
Bumps nbconvert from 6.1.0 to 6.5.1.

Release notes

Sourced from nbconvert's releases.

Release 6.5.1

No release notes provided.

6.5.0

What's Changed

Drop dependency on testpath. by @anntzer in jupyter/nbconvert#1723

Adopt pre-commit by @blink1073 in jupyter/nbconvert#1744

Add pytest settings and handle warnings by @blink1073 in jupyter/nbconvert#1745

Apply Autoformatters by @blink1073 in jupyter/nbconvert#1746

Add git-blame-ignore-revs by @blink1073 in jupyter/nbconvert#1748

Update flake8 config by @blink1073 in jupyter/nbconvert#1749

support bleach 5, add packaging and tinycss2 dependencies by @bollwyvl in jupyter/nbconvert#1755

[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in jupyter/nbconvert#1752

update cli example by @leahecole in jupyter/nbconvert#1753

Clean up pre-commit by @blink1073 in jupyter/nbconvert#1757

Clean up workflows by @blink1073 in jupyter/nbconvert#1750

New Contributors

@pre-commit-ci made their first contribution in jupyter/nbconvert#1752

Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.5...6.5

6.4.3

What's Changed

Add section to customizing showing how to use template inheritance by @stefanv in jupyter/nbconvert#1719

Remove ipython genutils by @rgs258 in jupyter/nbconvert#1727

Update changelog for 6.4.3 by @blink1073 in jupyter/nbconvert#1728

New Contributors

@stefanv made their first contribution in jupyter/nbconvert#1719

@rgs258 made their first contribution in jupyter/nbconvert#1727

Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.2...6.4.3

6.4.0

What's Changed

Optionally speed up validation by @gwincr11 in jupyter/nbconvert#1672

Adding missing div compared to JupyterLab DOM structure by @SylvainCorlay in jupyter/nbconvert#1678

Allow passing extra args to code highlighter by @yuvipanda in jupyter/nbconvert#1683

Prevent page breaks in outputs when printing by @SylvainCorlay in jupyter/nbconvert#1679

Add collapsers to template by @SylvainCorlay in jupyter/nbconvert#1689

Fix recent pandoc latex tables by adding calc and array (#1536, #1566) by @cgevans in jupyter/nbconvert#1686

Add an invalid notebook error by @gwincr11 in jupyter/nbconvert#1675

Fix typos in execute.py by @TylerAnderson22 in jupyter/nbconvert#1692

Modernize latex greek math handling (partially fixes #1673) by @cgevans in jupyter/nbconvert#1687

Fix use of deprecated API and update test matrix by @blink1073 in jupyter/nbconvert#1696

Update nbconvert_library.ipynb by @letterphile in jupyter/nbconvert#1695

Changelog for 6.4 by @blink1073 in jupyter/nbconvert#1697

New Contributors

... (truncated)

Commits

7471b75 Release 6.5.1

c1943e0 Fix pre-commit

8685e93 Fix tests

0abf290 Run black and prettier

418d545 Run test on 6.x branch

bef65d7 Convert input to string prior to escape HTML

0818628 Check input type before escaping

b206470 GHSL-2021-1017, GHSL-2021-1020, GHSL-2021-1021

a03cbb8 GHSL-2021-1026, GHSL-2021-1025

48fe71e GHSL-2021-1024

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Unable to load umap model

Hi, I'm unable to load the included umap models. PCA models work.

When running the following code,

# write all the super population dimred models for kidd and Seldin
for aisnps_set, df, df_labels in zip(
    ["kidd", "Seldin"], 
    [df_kidd_encoded, df_seldin_encoded], 
    [df_kidd["superpopulation"], df_seldin["superpopulation"]]
):
    for algorithm, labels in zip(["pca", "umap", "nca"], [None, None, None, df_labels]):
        print(algorithm,aisnps_set,OVERWRITE_MODEL,labels)
        df_reduced = dimensionality_reduction(df, algorithm=algorithm, aisnps_set=aisnps_set, overwrite_model=OVERWRITE_MODEL, labels=labels, population_level="super population")
        knn_model = train(df_reduced, df_labels, algorithm=algorithm, aisnps_set=aisnps_set, k=9, population_level="superpopulation", overwrite_model=OVERWRITE_MODEL)

I get the error below:

2022-08-22 17:16:03.823 | INFO     | ezancestry.dimred:dimensionality_reduction:126 - Successfully loaded a dimensionality reduction model
pca kidd False None
umap kidd False None
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [17], in <cell line: 2>()
      7 for algorithm, labels in zip(["pca", "umap", "nca"], [None, None, None, df_labels]):
      8     print(algorithm,aisnps_set,OVERWRITE_MODEL,labels)
----> 9     df_reduced = dimensionality_reduction(df, algorithm=algorithm, aisnps_set=aisnps_set, overwrite_model=OVERWRITE_MODEL, labels=labels, population_level="super population")
     10     knn_model = train(df_reduced, df_labels, algorithm=algorithm, aisnps_set=aisnps_set, k=9, population_level="superpopulation", overwrite_model=OVERWRITE_MODEL)

File ~/ezancestry/ezancestry/dimred.py:107, in dimensionality_reduction(df, algorithm, aisnps_set, n_components, overwrite_model, labels, population_level, models_directory, random_state)
    105 if algorithm in set(["pca", "umap"]):
    106     try:
--> 107         reducer = joblib.load(
    108             models_directory.joinpath(f"{algorithm}.{aisnps_set}.bin")
    109         )
    110     except FileNotFoundError:
    111         return None

File ~/opt/anaconda3/lib/python3.9/site-packages/joblib/numpy_pickle.py:587, in load(filename, mmap_mode)
    581             if isinstance(fobj, str):
    582                 # if the returned file object is a string, this means we
    583                 # try to load a pickle file generated with an version of
    584                 # Joblib so we load it with joblib compatibility function.
    585                 return load_compatibility(fobj)
--> 587             obj = _unpickle(fobj, filename, mmap_mode)
    588 return obj

File ~/opt/anaconda3/lib/python3.9/site-packages/joblib/numpy_pickle.py:506, in _unpickle(fobj, filename, mmap_mode)
    504 obj = None
    505 try:
--> 506     obj = unpickler.load()
    507     if unpickler.compat_mode:
    508         warnings.warn("The file '%s' has been generated with a "
    509                       "joblib version less than 0.10. "
    510                       "Please regenerate this pickle file."
    511                       % filename,
    512                       DeprecationWarning, stacklevel=3)

File ~/opt/anaconda3/lib/python3.9/pickle.py:1212, in _Unpickler.load(self)
   1210             raise EOFError
   1211         assert isinstance(key, bytes_types)
-> 1212         dispatch[key[0]](self)
   1213 except _Stop as stopinst:
   1214     return stopinst.value

File ~/opt/anaconda3/lib/python3.9/pickle.py:1589, in _Unpickler.load_reduce(self)
   1587 args = stack.pop()
   1588 func = stack[-1]
-> 1589 stack[-1] = func(*args)

File ~/opt/anaconda3/lib/python3.9/site-packages/numba/core/serialize.py:97, in _unpickle__CustomPickled(serialized)
     92 def _unpickle__CustomPickled(serialized):
     93     """standard unpickling for `_CustomPickled`.
     94 
     95     Uses `NumbaPickler` to load.
     96     """
---> 97     ctor, states = loads(serialized)
     98     return _CustomPickled(ctor, states)

AttributeError: Can't get attribute '_rebuild_function' on <module 'numba.core.serialize' from '/Users/jacksonc08/opt/anaconda3/lib/python3.9/site-packages/numba/core/serialize.py'>

I have tested that it is certainly the UMAP model that is causing the issue.

import pandas as pd

import joblib

obj = joblib.load(r"/Users/jacksonc08/ezancestry/data/models/umap.kidd.bin")

This gives the same error.

Looking online, it seems to be an issue with the numba package (a dependency of joblib), which no longer includes the _rebuild_function function. See here.

Do you have any recommendations on how to fix this error? Many thanks.

opened by redjay8 3

Bump streamlit from 0.87.0 to 1.11.1
Bumps streamlit from 0.87.0 to 1.11.1.

Release notes

Sourced from streamlit's releases.

1.11.1

No release notes provided.

1.11.0

No release notes provided.

1.10.0

No release notes provided.

1.9.2

No release notes provided.

1.9.1

No release notes provided.

1.9.0

No release notes provided.

1.8.1

No release notes provided.

1.8.0

No release notes provided.

1.7.0

❄️ Add st.snow()!

1.6.0

🗜 WebSocket compression is now disabled by default, which will improve CPU and latency performance for large dataframes. You can use the server.enableWebsocketCompression configuration option to re-enable it if you find the increased network traffic more impactful.

☑️ 🔘 Radio and checkboxes improve focus on Keyboard navigation (#4308)

1.5.1

No release notes provided.

1.5.0

Release date: Jan 27, 2022

Notable Changes

🌟 Favicon defaults to a PNG to allow for transparency (#4272).

🚦 Select Slider Widget now has the disabled parameter that removes interactivity (completing all of our widgets) (#4314).

Other Changes

🔤 Improvements to our markdown library to provide better support for HTML (specifically nested HTML) (#4221).

📖 Expanders maintain their expanded state better when multiple expanders are present (#4290).

🗳 Improved file uploader and camera input to call its on_change handler only when necessary (#4270).

1.4.0

... (truncated)

Commits

b6429b6 Fix linting

f4b2051 Up version to 1.11.1

80d9979 Ignore component requests outside of the component root

4a04eef Replace legacy app URLs in docs with custom subdomains (#4959)

27c29ac Up version to 1.11.0

03babac Test that GitRepo can handle import failures (#4942)

4c39606 Fix table overflow styling (#4934)

26de600 Fix issue with wrongly applied colors with Pandas styler (#4940)

ad4547f Fix widgets overwrites from short to long-hand props (#4935)

3809637 Add gap param to st.columns (#4887)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Releases(v0.0.7)

v0.0.7(May 13, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.6(Oct 19, 2021)

Mixed case filenames were not compatible on all OS.
Source code(tar.gz)
Source code(zip)
v0.0.5(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.4(Sep 21, 2021)
Include population descriptions in the output

Source code(tar.gz)
Source code(zip)
v0.0.3(Sep 20, 2021)
Fixed bugs when parsing the sample identifier from files

Fixed reading DataFrames as user input

Source code(tar.gz)
Source code(zip)
v0.0.2(Sep 14, 2021)

Include data in the package. Include the package itself.
Source code(tar.gz)
Source code(zip)
v0.0.1(Sep 13, 2021)

Initial release
Source code(tar.gz)
Source code(zip)