🦉Data Version Control | Git for Data & Models

Iterative

Last update: Jan 9, 2023

Related tags

SCM python git data-science machine-learning ai collaboration developer-tools reproducibility data-version-control hacktoberfest

Overview

Website • Docs • Blog • Twitter • Chat (Community & Support) • Tutorial • Mailing List

Data Version Control or DVC is an open-source tool for data science and machine learning projects. Key features:

Simple command line Git-like experience. Does not require installing and maintaining any databases. Does not depend on any proprietary online services.
Management and versioning of datasets and machine learning models. Data is saved in S3, Google cloud, Azure, Alibaba cloud, SSH server, HDFS, or even local HDD RAID.
Makes projects reproducible and shareable; helping to answer questions about how a model was built.
Helps manage experiments with Git tags/branches and metrics tracking.

DVC aims to replace spreadsheet and document sharing tools (such as Excel or Google Docs) which are being used frequently as both knowledge repositories and team ledgers. DVC also replaces both ad-hoc scripts to track, move, and deploy different model versions; as well as ad-hoc data file suffixes and prefixes.

Contents

How DVC works
Quick start
Installation
Comparison to related technologies
Contributing
Mailing List
Copyright
Citation

How DVC works

We encourage you to read our Get Started guide to better understand what DVC is and how it can fit your scenarios.

The easiest (but not perfect!) analogy to describe it: DVC is Git (or Git-LFS to be precise) & Makefiles made right and tailored specifically for ML and Data Science scenarios.

Git/Git-LFS part - DVC helps store and share data artifacts and models, connecting them with a Git repository.
Makefiles part - DVC describes how one data or model artifact was built from other data and code.

DVC usually runs along with Git. Git is used as usual to store and version code (including DVC meta-files). DVC helps to store data and model files seamlessly out of Git, while preserving almost the same user experience as if they were stored in Git itself. To store and share the data cache, DVC supports multiple remotes - any cloud (S3, Azure, Google Cloud, etc) or any on-premise network storage (via SSH, for example).

The DVC pipelines (computational graph) feature connects code and data together. It is possible to explicitly specify all steps required to produce a model: input dependencies including data, commands to run, and output information to be saved. See the quick start section below or the Get Started tutorial to learn more.

Quick start

Please read Get Started guide for a full version. Common workflow commands include:

Step	Command
Track data	`$ git add train.py` `$ dvc add images.zip`
Connect code and data by commands	`$ dvc run -d images.zip -o images/ unzip -q images.zip` `$ dvc run -d images/ -d train.py -o model.p python train.py`
Make changes and reproduce	`$ vi train.py` `$ dvc repro model.p.dvc`
Share code	`$ git add .` `$ git commit -m 'The baseline model'` `$ git push`
Share data and ML models	`$ dvc remote add myremote -d s3://mybucket/image_cnn` `$ dvc push`

Installation

There are four options to install DVC: pip, Homebrew, Conda (Anaconda) or an OS-specific package. Full instructions are available here.

Snap (Snapcraft/Linux)

snap install dvc --classic

This corresponds to the latest tagged release. Add --beta for the latest tagged release candidate, or --edge for the latest master version.

Choco (Chocolatey/Windows)

choco install dvc

Brew (Homebrew/Mac OS)

brew install dvc

Conda (Anaconda)

conda install -c conda-forge dvc

pip (PyPI)

pip install dvc

Depending on the remote storage type you plan to use to keep and share your data, you might need to specify one of the optional dependencies: s3, gs, azure, oss, ssh. Or all to include them all. The command should look like this: pip install dvc[s3] (in this case AWS S3 dependencies such as boto3 will be installed automatically).

To install the development version, run:

pip install git+git://github.com/iterative/dvc

Package

Self-contained packages for Linux, Windows, and Mac are available. The latest version of the packages can be found on the GitHub releases page.

Ubuntu / Debian (deb)

sudo wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list
sudo apt-get update
sudo apt-get install dvc

Fedora / CentOS (rpm)

sudo wget https://dvc.org/rpm/dvc.repo -O /etc/yum.repos.d/dvc.repo
sudo yum update
sudo yum install dvc

Comparison to related technologies

Git-annex - DVC uses the idea of storing the content of large files (which should not be in a Git repository) in a local key-value store, and uses file hardlinks/symlinks instead of copying/duplicating files.
Git-LFS - DVC is compatible with any remote storage (S3, Google Cloud, Azure, SSH, etc). DVC also uses reflinks or hardlinks to avoid copy operations on checkouts; thus handling large data files much more efficiently.
Makefile (and analogues including ad-hoc scripts) - DVC tracks dependencies (in a directed acyclic graph).
Workflow Management Systems - DVC is a workflow management system designed specifically to manage machine learning experiments. DVC is built on top of Git.
DAGsHub - This is a Github equivalent for DVC. Pushing Git+DVC based repositories to DAGsHub will produce in a high level project dashboard; including DVC pipelines and metrics visualizations, as well as links to any DVC-managed files present in cloud storage.

Contributing

Contributions are welcome! Please see our Contributing Guide for more details.

Mailing List

Want to stay up to date? Want to help improve DVC by participating in our occasional polls? Subscribe to our mailing list. No spam, really low traffic.

Copyright

This project is distributed under the Apache license version 2.0 (see the LICENSE file in the project root).

By submitting a pull request to this project, you agree to license your contribution under the Apache license version 2.0 to this project.

Citation

Iterative, DVC: Data Version Control - Git for Data & Models (2020) DOI:10.5281/zenodo.012345.

Barrak, A., Eghan, E.E. and Adams, B. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects , in Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2021. Hawaii, USA.

Comments

Reconsider gc implementation

As pointed out in discussion in #1691, we should reconsider gc implementation. Currently, if called without any options, dvc will collect current branch dependencies and outputs checksums, and remove everything besides it. We can easily clear history of changes with this command. gc should be safer with default options. Straightforward implementation could get all outputs for all revisions in git repo and remove everything that is not on list.

As pointed out by @Suor, this approach might be slow for repository with long history.
enhancement p1-important ui research

opened by pared 73
support push/pull/metrics/gc, etc across different commits
Currently dvc metrics show can show metric values across different branches (-a) and different tags (-T). Can you consider supporting showing different metric values across different commits in the same branch?

The background of this is (simplified example): say I'm currently training a model, where I'm changing a certain parameter, param1 (for instance, number of trees in a forest). The way I probably would like to work is to find a first value for param1, commit the current state, continue changing param1 and continue committing the successive states that I consider worth saving. At some point I would like to look back and identify the setup that gave me the best results.

The way DVC currently works forces me to create a new branch/tag for each trial I want to keep track of, and this seems a bit overwhelming.

Depending on how different the experiments I'm running are and their level of granularity I could decide how to keep track of them (new commits VS new branches/tags).

Notes:

The example above is overly simplified and there are better ways of tuning specific models parameters. But this gets more complicated if I'm changing more stuff (model hyperparameters, data processing, features to use, etc).

If dvc were to support what I'm proposing here, an extra argument would probably be required to limit how many commits DVC would look back at. Otherwise it would show all the metric values since the beginning of the repo history, which can be unhelpful and messy.

feature request p1-important research
opened by silverdna 71

Unexpected error - Adding files

Everytime that im trying to add some individuals files or complete directories the same unexpected error appears:

> dvc add -v -R model
DEBUG: Trying to spawn '['c:\\users\\luisfelipe_melo_mora\\appdata\\local\\programs\\python\\python37-32\\python.exe', 'C:\\Users\\luisfelipe_melo_mora\\AppData\\Local\\Programs\\Python\\Python37-32\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\luisfelipe_melo_mora\\appdata\\local\\programs\\python\\python37-32\\python.exe', 'C:\\Users\\luisfelipe_melo_mora\\AppData\\Local\\Programs\\Python\\Python37-32\\Scripts\\dvc', 'daemon', '-q',
'updater']'
ERROR: unexpected error - Already unlocked
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\dvc\main.py", line 48, in main
    cmd = args.func(args)
  File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\dvc\command\base.py", line 48, in __init__
    updater.check()
  File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\dvc\updater.py", line 54, in check
    self._with_lock(self._check, "checking")
  File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\dvc\updater.py", line 45, in _with_lock
    func()
  File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\flufl\lock\_lockfile.py", line 338, in __exit__
    self.unlock()
  File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\flufl\lock\_lockfile.py", line 287, in unlock
    raise NotLockedError('Already unlocked')
flufl.lock._lockfile.NotLockedError: Already unlocked
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

I have a remote configuration by SSH:

['remote "myssh"']
url = ssh://domain:/path
user = myuser
port = 22
ask_password = true
[core]
remote = myssh

And here the version of dvc that im using:

> dvc version
DVC version: 0.69.0
Python version: 3.7.4
Platform: Windows-10-10.0.17134-SP0
Binary: False
Package: pip
Cache: reflink - False, hardlink - True, symlink - False

Thanks for your help!

bug p0-critical

opened by luchoPipe87 69

ML experiments and hyperparameters tuning
UPDATE: Skip to https://github.com/iterative/dvc/issues/2799#issuecomment-650464000 for a summary and updated requirements, and https://github.com/iterative/dvc/issues/2799#issuecomment-652969635 for the beginning of the implementation discussion.

Problem

There are a lot of discussions on how to manage ML experiments with DVC. Today's DVC design allows ML experiments through Git-based primitives such as commits and branches. This works nicely for large ML experiments when code writing and testing required. However, this model is too heavy for the hyperparameters tuning stage when the user makes dozens of small, one-line changes in config or code. Users don't want to have dozens of Git-commits or branches.

Requirements

A lightweight abstraction needs to be created in DVC to support hyperparameters-like tiny experiments without Git-commits. Hyperparameters tunning stage can be considered as a separate user activity outside of Git workflow. But the result of this activity still needs to be managed by Git preferably by a single commit.

High-level requirements to the hyperparameters tunning stage:

Run. Run dozens of experiments without committing any results into Git while keeping track of all the experiments. Each of the experiments includes a small config change or code change (usually, 1-2 lines).

Compare. A user should be able to compare two experiments: see diffs for code (and probably metrics)

Visualize. A user should be able to see all the experiments results: metrics that were generated. It might be some table with metrics or a graph. CSV table needs to be supported for custom visualization.

Propagate. Choose "the best" experiment (not necessarily the highest metrics) and propagate it to the workspace (bring all the config and code changes. Important: without retraining). Then it can be committed to Git. This is the final result of the current hyperparameter tunning stage. After that, the user can continue to work with a project in a regular Git workflow.

Store. Some (or all) of the experiments might be still useful (in additional to "the best" one). A user should be able to commit them to the Git as well. Preferably in a single commit to keep the Git history clean.

Clean. Not useful experiments should be removed with all the code and data artifacts that were created. A special subcommand of dvc gc might be needed.

[*] Parallel. In some cases, the experiments can be run in parallel which aligns with DVC parallel execution plans: #2212, #755. This might not be implemented now (in the 1st version of this feature) but it is important to support parallel execution by this new lightweight abstraction.

Group. Iterations of hyperparameters tuning might be not related to each other and need to be managed and visualized separately. Experiments need to be grouped somehow.

What should NOT be covered by this feature?

This feature is NOT about the hyperparameter grid-search. In most cases, hyperparameters tuning is done by users manually using "smart" assumptions and hypotheses about hyperparameter space. Grid-search can be implemented on top of this feature/command using bash for example.

The ability to run the experiments from bash might be also a requirement for this feature request.

Possible implementations

This is an open question but many data scientists create directories for each of the experiments. In some cases, people create directories for a group of experiments and then experiments inside. We can use some of these ideas/practices to better align with users' experience and intuition.

Actions

This is a high-level feature request (epic). The requirements and an initial design need to be discussed and more feature requests need to be created. @iterative/engineering please share your feedback. Is something missing here?

EDITED:

Related issues

#2379 https://github.com/iterative/dvc/issues/2532 #1018 can be relevant (?) Discussion
feature request
opened by dmpetrov 68
Introduce hyper parameters and config
For an ML experiment, it is important to know metrics as well as the parameters that were used in order to get the metrics. Today there is no training/processing parameter concept in DVC which creates a problem when a user needs to visualize an experiment for example in some UI.

A common workaround is to track parameters as metrics. However, the meaning of metrics is different. All the UI tools (including dvc metrics diff) need to show deltas where deltas do not make sense to some types of params. For example, delta for learning rate might be ok to see (values are still better), but delta for a number of layers (32, 64 or 128) does not make sense, the same for not numeric params like strings.

Also, config/parameters are a pre-requisite for experiment management (#2799 or CI/CD scenarios) when DVC (or other automation tools) need to change training regarding provided parameters.

Another benefit of the "understanding" parameter - DVC can use this information during repro. For example, DVC can realize that a step process which depends on config file config.json should not be run despite the config file change because the metrics it uses were not changed.

We need to introduce the experiment config file/parameters file with a fixed structure that DVC can understand.

Open questions:

Filename. config.json, dvcconfig.json, params.json.

File format: json, text config/ini (name=value), Hydra, ... We can run a survey.

How to track param dependency for stages. We can introduce a new type of dependency: param. If it is given then the stage depends on the file and on particular params values. Like dvc run -p learning_rate -p LL5_levels ....

DVC should probably support groups of params. Param name pattern could be used : dvc run -p 'processing.*' ...

feature request discussion product
opened by dmpetrov 59
store whole DAG in one DVC-file

I understand the merits of having multiple .dvc files for complex processes, but it would be just great to have the option to store the whole DAG in one Dvcfile!

I feel it might help the overall readability of the structure
feature request p2-medium research product

opened by Casyfill 56
Using dvc only for dataset management (e.g. no dvc run pipeline).
I am dealing with a large hierarchical data set. One where artifacts are pulled from various directories to generate contiguous data sets that are then fed to ML processes downstream. I don't want to use dvc to reproduce the pipeline, at least not yet. My needs are rather to be able to version the overall image dataset hierarchy, for the purpose of manual inspection of the whole hierarchy and moving images into groups or removing them altogether when necessary.

This enables folks with less ML expertise control the data set they want to build by grouping the content together that they want to pick up when generating the data set. The data set is not a list of images, rather it is a list of lower dimensional feature vectors extracted from those images.

I'm finding dvc taking a potentially unreasonable amount of time to just add and commit. Perhaps I don't understand what I'm doing or haven't set my expectations correctly.

I wanted to keep these operations small in order to ensure things were working well. I have done the following. I have approximately 300K in total in this set right now.

store 60K images on local file system, under the data/ directory.

dvc add data/

dvc push -r remote. I forgot to commit here since things took so long and I wanted to see if pushing worked.

store 120K additional images to another sub directory under the data/ directory.

dvc add data/ -> goes through all of the files in data/ regardless. I ran -v here and showed the previous files.

dvc push -r remote.

dvc commit. Here dvc is taking the greater amount of 99% of system memory (13 GB) and appears to be causing disk thrashing. It's been running nearly for a day so far.

I am just looking for some guidance in managing a dataset of this nature using dvc in a way that will not eat up so much time, disk, compute, etc. If I'm doing something suboptimal, then I want to shine some light on that.
question performance research
opened by JoeyCarson 54
add: --to-remote needed? OR --external needed?
Follow up to https://github.com/iterative/dvc/pull/5198#issuecomment-774299750, #5301, and https://github.com/iterative/dvc.org/pull/2172#discussion_r573963049:

Question

add --to-remote is a bit strange because normally add doesn't move target data, rather tracks it in-place (analog to git add). But --to-remote implies that external data will be moved into the workspace at some point, which we skip for now but "pre-push" (transfer) it to remote storage (for later pull/fetch).

As of now add --to-remote has a similar result to get-url + add + push + remove, gc. So OK, maybe it's nice to have a shortcut to all that, but we already have import-url (--to-remote) to achieve the same.

The only difference vs. importing is that the data source is not recorded as a dependency in the .dvc file. So you can't update it or unfreeze+repro it. However I don't see any use cases where you would want to prevent the .dvc from having this dep, as you can simply never update or unfreeze it.

TLDR: I think import-url --to-remote is enough and what we should recommend for these situations. And add --to-remote breaks the Git analogy. Cc @dberenbaum

Improvement

[x] But if we keep it, an improvement would be to NOT require the --external flag with it (cc @isidentical). This saves the user from typing a flag that is always needed, but also make sense since the data is not actually being treated as external in the sense that it won't be tracked/controlled in it's original location (requiring external cache, etc.).

[x] Finish or close iterative/dvc.org/pull/2172 when this is decided.

enhancement discussion product
opened by jorgeorpinel 47
new command to list data artifacts in a DVC project
Especially useful for "browsing" external DVC projects on Git hosting before using dvc get or dvc import. Looking at the Git repo doesn't show the artifacts because they're only referenced in DVC-files (which can be found anywhere), not tracked by Git.

Perhaps dvc list or dvc artifacts? (Or/and both dvc get list and dvc import list)

As mentioned in https://github.com/iterative/dvc.org/pull/611#discussion_r324998285 and other discussions.

UPDATE: Proposed spec (from https://github.com/iterative/dvc/issues/2509#issuecomment-533019513):

usage: dvc list [-h] [-q | -v] [--recursive [LEVEL]] [--rev REV | --versions] url [target [target ...]] positional arguments: url URL of Git repository with DVC project to download from. target Paths to DVC-files or directories within the repository to list outputs for.

UPDATE: Don't forget to update docs AND tab completion scripts when this is implemented.
feature request p1-important c8-full-day
opened by jorgeorpinel 45
Incremental processing or streaming in micro-batches

It seems like it is only possible to replace a dataset entirely and then re-run the analysis. Incremental processing would enable more efficient processing by avoiding recomputation. Here's how Pachyderm does it.
enhancement feature request p2-medium research

opened by kskyten 44
dvc/dagascii: Use pager instead of AsciiCanvas._do_draw
Uses Stdlib's pydoc to draw the output in the interactive mode while doing e.g. dvc pipeline show ...

Fixes #2807

[x] ❗ Have you followed the guidelines in the Contributing to DVC list?

[x] 📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

[x] ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addresses. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Related MR: https://github.com/iterative/dvc.org/pull/831
opened by xliiv 43

cloud versioning: fails with cache: false outputs

Bug Report

Description

It looks like cloud versioning is failing to push when stage outputs are marked as cache: false. Might be related to https://github.com/iterative/dvc/issues/4428.

Reproduce

Set up a cloud-versioned remote for https://github.com/iterative/example-get-started and push to it.

Here's the output:

$ dvc push -v
2023-01-04 15:09:02,716 DEBUG: indexing latest worktree for 'dave-sandbox-versioning/example-get-started/remote'
2023-01-04 15:09:03,269 DEBUG: Pushing worktree changes to 'dave-sandbox-versioning/example-get-started/remote'
2023-01-04 15:09:03,658 ERROR: unexpected error - ('eval', 'live', 'plots')
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dave/Code/dvc/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/Users/dave/Code/dvc/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/Users/dave/Code/dvc/dvc/commands/data_sync.py", line 59, in run
    processed_files_count = self.repo.push(
  File "/Users/dave/Code/dvc/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/Users/dave/Code/dvc/dvc/repo/push.py", line 50, in push
    pushed += _push_worktree(
  File "/Users/dave/Code/dvc/dvc/repo/push.py", line 117, in _push_worktree
    return push_worktree(repo, remote, targets=targets, **kwargs)
  File "/Users/dave/Code/dvc/dvc/repo/worktree.py", line 148, in push_worktree
    _update_out_meta(out, repo.index.data[workspace])
  File "/Users/dave/Code/dvc/dvc/repo/worktree.py", line 179, in _update_out_meta
    entry = index[key]
  File "/Users/dave/Code/dvc-data/src/dvc_data/index/index.py", line 179, in __getitem__
    return self._trie[key]
  File "/Users/dave/miniforge3/envs/dvc/lib/python3.10/site-packages/pygtrie.py", line 859, in __getitem__
    node, _ = self._get_node(key_or_slice)
  File "/Users/dave/miniforge3/envs/dvc/lib/python3.10/site-packages/pygtrie.py", line 552, in _get_node
    raise KeyError(key)
KeyError: ('eval', 'live', 'plots')
------------------------------------------------------------
2023-01-04 15:09:04,223 DEBUG: Removing '/private/tmp/.C898YREazrnPXGYsWbFsib.tmp'
2023-01-04 15:09:04,223 DEBUG: Removing '/private/tmp/.C898YREazrnPXGYsWbFsib.tmp'
2023-01-04 15:09:04,223 DEBUG: Removing '/private/tmp/.C898YREazrnPXGYsWbFsib.tmp'
2023-01-04 15:09:04,223 DEBUG: Removing '/private/tmp/example-get-started/.dvc/cache/.DGQMVLCJsyvDKpKuGC3DdL.tmp'
2023-01-04 15:09:04,225 DEBUG: Version info for developers:
DVC version: 2.38.2.dev23+ga24c38967.d20230104
---------------------------------
Platform: Python 3.10.2 on macOS-13.1-arm64-arm-64bit
Subprojects:
        dvc_data = 0.28.5.dev1+ge0d19ab
        dvc_objects = 0.14.0
        dvc_render = 0.0.17
        dvc_task = 0.1.9
        dvclive = 1.3.1
        scmrepo = 0.1.5
Supports:
        azure (adlfs = 2022.9.1, knack = 0.9.0, azure-identity = 1.7.1),
        gdrive (pydrive2 = 1.15.0),
        gs (gcsfs = 2022.11.0),
        hdfs (fsspec = 2022.11.0+18.g0c55724.dirty, pyarrow = 7.0.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        oss (ossfs = 2021.8.0),
        s3 (s3fs = 2022.11.0+6.g804057f, boto3 = 1.24.59),
        ssh (sshfs = 2022.6.0),
        webdav (webdav4 = 0.9.4),
        webdavs (webdav4 = 0.9.4),
        webhdfs (fsspec = 2022.11.0+18.g0c55724.dirty)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https, s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-01-04 15:09:04,225 DEBUG: Analytics is disabled.

bug p1-important A: run-cache A: pipelines A: data-sync A: cloud-versioning

opened by dberenbaum 0

add --external: fails using Azure remote

Bug Report

Description

I am trying to track existing data from a storage account in Azure following current documentation.

Reproduce

dvc init
dvc remote add azcore azure://core-container
dvc remote add azdata azure://data-container
dvc add --external remote://azdata/existing-data

Expected

I'm not sure what is expected but the output is:

ERROR: unexpected error - : 'azure'

Environment information

Output of dvc doctor:

DVC version: 2.38.1 (pip)
---------------------------------
Platform: Python 3.9.6 on macOS-13.1-x86_64-i386-64bit
Subprojects:
	dvc_data = 0.28.4
	dvc_objects = 0.14.0
	dvc_render = 0.0.15
	dvc_task = 0.1.8
	dvclive = 1.3.1
	scmrepo = 0.1.4
Supports:
	azure (adlfs = 2022.11.2, knack = 0.10.1, azure-identity = 1.12.0),
	http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: azure, azure
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git

Additional Information:

2023-01-04 18:58:46,616 ERROR: unexpected error - : 'azure'
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/odbmgr.py", line 65, in __getattr__
    return self._odb[name]
KeyError: 'azure'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/commands/add.py", line 53, in run
    self.repo.add(
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/utils/collections.py", line 164, in inner
    result = func(*ba.args, **ba.kwargs)
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 156, in run
    return method(repo, *args, **kw)
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/add.py", line 190, in add
    stage.save(merge_versioned=True)
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 469, in save
    self.save_outs(
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 512, in save_outs
    out.save()
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/output.py", line 643, in save
    self.odb,
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/output.py", line 450, in odb
    odb = getattr(self.repo.odb, odb_name)
  File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/odbmgr.py", line 67, in __getattr__
    raise AttributeError from exc
AttributeError
------------------------------------------------------------
2023-01-04 18:58:46,711 DEBUG: Version info for developers:
DVC version: 2.38.1 (pip)
---------------------------------
Platform: Python 3.9.6 on macOS-13.1-x86_64-i386-64bit
Subprojects:
	dvc_data = 0.28.4
	dvc_objects = 0.14.0
	dvc_render = 0.0.15
	dvc_task = 0.1.8
	dvclive = 1.3.1
	scmrepo = 0.1.4
Supports:
	azure (adlfs = 2022.11.2, knack = 0.10.1, azure-identity = 1.12.0),
	http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure, azure
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-01-04 18:58:46,714 DEBUG: Analytics is enabled.
2023-01-04 18:58:46,911 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/st/05s6bkj55r9cw3hbrrdfvfqh0000gp/T/tmpoxhcmxev']'
2023-01-04 18:58:46,913 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/st/05s6bkj55r9cw3hbrrdfvfqh0000gp/T/tmpoxhcmxev']'

opened by rmlopes 0

external outputs: broken if pipeline output doesn't exist during stage initialization
Bug Report

Description

S3 external outputs are broken for pipelines since https://github.com/iterative/dvc/commit/7211bd02eda74d5434f1b7996647f7027e6e83b0 because of a bug in s3fs (and probably in other filesystems). They will only break if running a stage for which an output doesn't already exist. When initializing the stage, DVC will try to remove the nonexistent output and raise a FileNotFound error.

Reproduce

dvc repro will break if there is an external output and that output does not exist yet.

In a new repo, using some <s3_path> that doesn't exist yet, do this:

$ echo 'foo' > foo $ dvc stage add --external -n foo -d foo -O <s3_path> 'aws s3 cp params.yaml <s3_path>' $ dvc repro -v

Expected

dvc repro shouldn't fail while removing outputs. In this case, it fails because of what seems like a bug or at least inconsistent behavior in fsspec. Like mentioned in https://github.com/iterative/dvc/issues/5961#issuecomment-1365822275, output.remove for s3fs and other async filesystems calls _expand_path. When the path doesn't exist and recursive=True, _expand_path raises FileNotFoundError. When recursive=False, it returns the path. It also returns the path for the LocalFileSystem regardless of whether recursive=True, so not sure if it was intended to raise an error only for this specific scenario.
regression
opened by dberenbaum 1
update: inconsistency between `--no-download` and `--to-remote`

Seems like the behavior between --no-download and --to-remote is inconsistent. We can fix in this PR or follow up with another one. For --to-remote, the outs metadata is updated with the new info but the workspace remains untouched, while --no-download drop the outs metadata and deletes anything in the workspace.

Originally posted by @dberenbaum in https://github.com/iterative/dvc/issues/8752#issuecomment-1369173097
p3-nice-to-have A: data-sync

opened by dberenbaum 0
dvc add is stuck on Adding ... for ~20 hours
I'm trying to version control my 210 G data which contains 2.41M files. When I run

dvc -v add data_clean/ Adding...

It stuck here for 20 hours. Is it supposed to happen?

My DVC repository is present in the GCE instance.

Thanks
awaiting response performance
opened by mehadi92 1

Releases(2.38.1)

2.38.1(Dec 15, 2022)
What's Changed

Other Changes

add index._plot_sources by @skshetry in https://github.com/iterative/dvc/pull/8693

Full Changelog: https://github.com/iterative/dvc/compare/2.38.0...2.38.1
Source code(tar.gz)
Source code(zip)
dvc-2.38.1-1.x86_64.rpm(132.93 MB)
dvc-2.38.1.exe(53.18 MB)
dvc-2.38.1.pkg(103.07 MB)
dvc_2.38.1_amd64.deb(133.89 MB)
2.38.0(Dec 14, 2022)
What's Changed

🚀 New Features and Enhancements

exp: Generate a human-readable name beforehand. by @daavoo in https://github.com/iterative/dvc/pull/8659

🐛 Bug Fixes

Reset all indices on the brancher iteration by @shcheklein in https://github.com/iterative/dvc/pull/8679

🔨 Maintenance

build(deps-dev): Bump filelock from 3.8.0 to 3.8.2 by @dependabot in https://github.com/iterative/dvc/pull/8666

build(deps-dev): Bump pylint from 2.15.7 to 2.15.8 by @dependabot in https://github.com/iterative/dvc/pull/8661

build(deps-dev): Bump dvc-task from 0.1.6 to 0.1.8 by @dependabot in https://github.com/iterative/dvc/pull/8686

Full Changelog: https://github.com/iterative/dvc/compare/2.37.0...2.38.0
Source code(tar.gz)
Source code(zip)
dvc-2.38.0-1.x86_64.rpm(132.94 MB)
dvc-2.38.0.exe(53.18 MB)
dvc-2.38.0.pkg(103.07 MB)
dvc_2.38.0_amd64.deb(133.89 MB)
2.37.0(Dec 9, 2022)
What's Changed

🐛 Bug Fixes

worktree: fix default worktree remote/odb exception by @pmrowla in https://github.com/iterative/dvc/pull/8672

🔨 Maintenance

deps: bump dvc-data to 0.28.4 by @pmrowla in https://github.com/iterative/dvc/pull/8674

Other Changes

dvc update: support worktree update by @pmrowla in https://github.com/iterative/dvc/pull/8649

remote: disable gc/status for versioned remotes by @pmrowla in https://github.com/iterative/dvc/pull/8662

cloud versioning: push/fetch behavior cleanup by @pmrowla in https://github.com/iterative/dvc/pull/8667

push/fetch: cleanup cloud versioning CLI flags behavior by @pmrowla in https://github.com/iterative/dvc/pull/8673

deps: remove 3.11 checks for hydra; has 3.11 support now by @skshetry in https://github.com/iterative/dvc/pull/8677

Full Changelog: https://github.com/iterative/dvc/compare/2.36.0...2.37.0
Source code(tar.gz)
Source code(zip)
dvc-2.37.0-1.x86_64.rpm(132.91 MB)
dvc-2.37.0.exe(53.16 MB)
dvc-2.37.0.pkg(103.04 MB)
dvc_2.37.0_amd64.deb(133.86 MB)
2.36.0(Dec 1, 2022)
What's Changed

🚀 New Features and Enhancements

Solve the locking problem in temp and celery dir executor initialization. by @karajan1001 in https://github.com/iterative/dvc/pull/8623

exp: Expose baseline and name via run_env. by @daavoo in https://github.com/iterative/dvc/pull/8630

exp save: initial implementation by @daavoo in https://github.com/iterative/dvc/pull/8599

feat: top level params and metrics by @skshetry in https://github.com/iterative/dvc/pull/8529

🐛 Bug Fixes

index: skip data index load on empty view by @pmrowla in https://github.com/iterative/dvc/pull/8632

Solve the unexpected error at the end of the queued tasks running by @karajan1001 in https://github.com/iterative/dvc/pull/8640

plots: fix multi-file plots by @dberenbaum in https://github.com/iterative/dvc/pull/8639

stage add: don't fail if unable to create .gitignore by @dberenbaum in https://github.com/iterative/dvc/pull/8644

🔨 Maintenance

deps: add support for hdfs in Python 3.11 by @skshetry in https://github.com/iterative/dvc/pull/8627

exp list: cleanup and move logic inside repo api by @shcheklein in https://github.com/iterative/dvc/pull/8575

deps: bump dvc-data to 0.28.1 by @pmrowla in https://github.com/iterative/dvc/pull/8633

deps: bump dvc-data to 0.28.2 by @pmrowla in https://github.com/iterative/dvc/pull/8641

build(deps-dev): Bump pylint from 2.15.5 to 2.15.7 by @dependabot in https://github.com/iterative/dvc/pull/8643

deps: bump dvc-data to 0.28.3 by @pmrowla in https://github.com/iterative/dvc/pull/8648

Other Changes

remote: separate worktree vs version_aware behavior by @pmrowla in https://github.com/iterative/dvc/pull/8634

Full Changelog: https://github.com/iterative/dvc/compare/2.35.2...2.36.0
Source code(tar.gz)
Source code(zip)
dvc-2.36.0-1.x86_64.rpm(132.87 MB)
dvc-2.36.0.exe(53.14 MB)
dvc-2.36.0.pkg(103.00 MB)
dvc_2.36.0_amd64.deb(133.81 MB)
2.35.2(Nov 24, 2022)
What's Changed

Other Changes

repro dry: show information if the stage is cached by @ykasimov in https://github.com/iterative/dvc/pull/8405

Full Changelog: https://github.com/iterative/dvc/compare/2.35.1...2.35.2
Source code(tar.gz)
Source code(zip)
dvc-2.35.2-1.x86_64.rpm(132.72 MB)
dvc-2.35.2.exe(53.07 MB)
dvc-2.35.2.pkg(102.84 MB)
dvc_2.35.2_amd64.deb(133.66 MB)
2.35.1(Nov 23, 2022)

Full Changelog: https://github.com/iterative/dvc/compare/2.35.0...2.35.1
Source code(tar.gz)
Source code(zip)
dvc-2.35.1-1.x86_64.rpm(132.61 MB)
dvc-2.35.1.exe(53.01 MB)
dvc-2.35.1.pkg(102.73 MB)
dvc_2.35.1_amd64.deb(133.55 MB)
2.35.0(Nov 23, 2022)
What's Changed

🚀 New Features and Enhancements

ui: Fix WSL check in open_browser by @daavoo in https://github.com/iterative/dvc/pull/8604

🔨 Maintenance

build: fpm: don't create .build-id/* files by @efiop in https://github.com/iterative/dvc/pull/8611

Other Changes

worktree push: do not push existing versions by @pmrowla in https://github.com/iterative/dvc/pull/8606

testing: api: test opening a file in subdir by @efiop in https://github.com/iterative/dvc/pull/8610

Full Changelog: https://github.com/iterative/dvc/compare/2.34.3...2.35.0
Source code(tar.gz)
Source code(zip)
2.34.3(Nov 22, 2022)
What's Changed

🐛 Bug Fixes

Fix exp list ref heads handling by @shcheklein in https://github.com/iterative/dvc/pull/8554

parsing: Escape str interpolation in dict unpacking. by @daavoo in https://github.com/iterative/dvc/pull/8204

hydra: Use OmegaConf.to_yaml for dumping .yaml output. by @daavoo in https://github.com/iterative/dvc/pull/8587

queue kill: we can manually mark problematic tasks as failure by @karajan1001 in https://github.com/iterative/dvc/pull/8580

Solve the wrong checkpoint tip info during executor running by @karajan1001 in https://github.com/iterative/dvc/pull/8596

🔨 Maintenance

build(deps-dev): Bump dvc-render from 0.0.12 to 0.0.13 by @dependabot in https://github.com/iterative/dvc/pull/8568

build(deps-dev): Bump dvc-render from 0.0.13 to 0.0.14 by @dependabot in https://github.com/iterative/dvc/pull/8591

deps: bump dvc-data, dvc-azure by @pmrowla in https://github.com/iterative/dvc/pull/8594

deps: bump dvc-data to 0.28.0 by @pmrowla in https://github.com/iterative/dvc/pull/8605

Other Changes

deps: bump dvc-data to 0.26.0 by @efiop in https://github.com/iterative/dvc/pull/8566

import-url: disable push by default for cloud-versioned imports by @pmrowla in https://github.com/iterative/dvc/pull/8578

plots: data conversion: adjust for viewer backend by @pared in https://github.com/iterative/dvc/pull/8421

worktree: support push: false by @pmrowla in https://github.com/iterative/dvc/pull/8581

worktree add: preserve version metadata for unmodified files on dvc add by @pmrowla in https://github.com/iterative/dvc/pull/8595

plots: set default x label by @dberenbaum in https://github.com/iterative/dvc/pull/8589

Full Changelog: https://github.com/iterative/dvc/compare/2.34.2...2.34.3
Source code(tar.gz)
Source code(zip)
dvc-2.34.3-1.x86_64.rpm(131.53 MB)
dvc-2.34.3.exe(52.90 MB)
dvc-2.34.3.pkg(102.05 MB)
dvc_2.34.3_amd64.deb(132.42 MB)
2.34.2(Nov 15, 2022)
What's Changed

🐛 Bug Fixes

hydra: Raise error when name and sweeps. by @daavoo in https://github.com/iterative/dvc/pull/8556

fetch/pull: fix regression when using targeted fetch in repo containing import-url imports by @pmrowla in https://github.com/iterative/dvc/pull/8551

🔨 Maintenance

pyinstaller: use pydrive2 package hooks by @pmrowla in https://github.com/iterative/dvc/pull/8564

Full Changelog: https://github.com/iterative/dvc/compare/2.34.1...2.34.2
Source code(tar.gz)
Source code(zip)
dvc-2.34.2-1.x86_64.rpm(131.43 MB)
dvc-2.34.2.exe(52.89 MB)
dvc-2.34.2.pkg(101.96 MB)
dvc_2.34.2_amd64.deb(132.33 MB)
2.34.1(Nov 11, 2022)
What's Changed

🐛 Bug Fixes

Make exp show handle errors better by @karajan1001 in https://github.com/iterative/dvc/pull/8533

Solve the crash on getting name of applied experiment branch by @karajan1001 in https://github.com/iterative/dvc/pull/8541

Fix some celery queue related ci failure. by @karajan1001 in https://github.com/iterative/dvc/pull/8404

🔨 Maintenance

index: support filtering view by output by @pmrowla in https://github.com/iterative/dvc/pull/8537

dvc exceptions CyclicGraphError: add more clear message for the excep… by @ykasimov in https://github.com/iterative/dvc/pull/8263

build(deps-dev): Bump dvc-task from 0.1.4 to 0.1.5 by @dependabot in https://github.com/iterative/dvc/pull/8539

build(deps-dev): Bump dvc-gs from 2.19.1 to 2.20.0 by @dependabot in https://github.com/iterative/dvc/pull/8548

build(deps-dev): Bump mypy from 0.982 to 0.990 by @dependabot in https://github.com/iterative/dvc/pull/8535

build(deps-dev): Bump iterative-telemetry from 0.0.5 to 0.0.6 by @dependabot in https://github.com/iterative/dvc/pull/8538

Other Changes

plots: support svg by @blakeNaccarato in https://github.com/iterative/dvc/pull/8542

New Contributors

@blakeNaccarato made their first contribution in https://github.com/iterative/dvc/pull/8542

Full Changelog: https://github.com/iterative/dvc/compare/2.34.0...2.34.1
Source code(tar.gz)
Source code(zip)
dvc-2.34.1-1.x86_64.rpm(122.11 MB)
dvc-2.34.1.exe(49.20 MB)
dvc-2.34.1.pkg(92.38 MB)
dvc_2.34.1_amd64.deb(122.76 MB)
2.34.0(Nov 7, 2022)
What's Changed

🔨 Maintenance

hydra: Raise lazy DvcException for Python >= 3.11 by @daavoo in https://github.com/iterative/dvc/pull/8521

build(deps-dev): Bump dvc-s3 from 2.20.1 to 2.21.0 by @dependabot in https://github.com/iterative/dvc/pull/8524

Other Changes

plots: allow top-level strings by @dberenbaum in https://github.com/iterative/dvc/pull/8482

import-url: include files entry for cloud versioned dir dependencies by @pmrowla in https://github.com/iterative/dvc/pull/8528

ci: bench: use 3.11 in benchmarks by @skshetry in https://github.com/iterative/dvc/pull/8525

fix hydra_sweeps referenced before assignment by @dberenbaum in https://github.com/iterative/dvc/pull/8530

DVCLive 1.0 by @daavoo in https://github.com/iterative/dvc/pull/8532

Full Changelog: https://github.com/iterative/dvc/compare/2.33.2...2.34.0
Source code(tar.gz)
Source code(zip)
dvc-2.34.0-1.x86_64.rpm(122.91 MB)
dvc-2.34.0.exe(49.19 MB)
dvc-2.34.0.pkg(92.39 MB)
dvc_2.34.0_amd64.deb(123.57 MB)
2.33.2(Nov 3, 2022)
What's Changed

🐛 Bug Fixes

commit: skip changed_entries check on force commit by @pmrowla in https://github.com/iterative/dvc/pull/8505

exp run: catch hydra import in 3.11 by @pmrowla in https://github.com/iterative/dvc/pull/8519

🔨 Maintenance

build(deps-dev): Bump pylint from 2.15.4 to 2.15.5 by @dependabot in https://github.com/iterative/dvc/pull/8463

build(deps-dev): Bump pytest from 7.1.3 to 7.2.0 by @dependabot in https://github.com/iterative/dvc/pull/8479

build(deps): Bump pyinstaller from 5.0 to 5.6.1 by @dependabot in https://github.com/iterative/dvc/pull/8475

build(deps-dev): Bump pytest-xdist from 2.5.0 to 3.0.2 by @dependabot in https://github.com/iterative/dvc/pull/8474

build(deps): Bump pyinstaller from 5.6.1 to 5.6.2 by @dependabot in https://github.com/iterative/dvc/pull/8499

build: bump pyinstaller packages python version to 3.10 by @skshetry in https://github.com/iterative/dvc/pull/8511

deps: bump scmrepo to 0.1.3 by @pmrowla in https://github.com/iterative/dvc/pull/8520

New Contributors

@step-security-bot made their first contribution in https://github.com/iterative/dvc/pull/8496

Full Changelog: https://github.com/iterative/dvc/compare/2.33.1...2.33.2
Source code(tar.gz)
Source code(zip)
dvc-2.33.2-1.x86_64.rpm(122.92 MB)
dvc-2.33.2.exe(49.15 MB)
dvc-2.33.2.pkg(92.40 MB)
dvc_2.33.2_amd64.deb(123.57 MB)
2.33.1(Oct 31, 2022)
What's Changed

deps: bump dvc-data to 0.25.2 by @efiop in https://github.com/iterative/dvc/pull/8489

Full Changelog: https://github.com/iterative/dvc/compare/2.33.0...2.33.1
Source code(tar.gz)
Source code(zip)
dvc-2.33.1-1.x86_64.rpm(131.08 MB)
dvc-2.33.1.exe(60.35 MB)
dvc-2.33.1.pkg(101.38 MB)
dvc_2.33.1_amd64.deb(131.95 MB)
2.33.0(Oct 30, 2022)
What's Changed

deps: bump dvc-data to 0.25.1 by @efiop in https://github.com/iterative/dvc/pull/8487

Full Changelog: https://github.com/iterative/dvc/compare/2.32.1...2.33.0
Source code(tar.gz)
Source code(zip)
dvc-2.33.0-1.x86_64.rpm(131.08 MB)
dvc-2.33.0.exe(60.35 MB)
dvc-2.33.0.pkg(101.38 MB)
dvc_2.33.0_amd64.deb(131.95 MB)
2.32.1(Oct 29, 2022)
What's Changed

yappi: add --yappi-separate-threads flag by @efiop in https://github.com/iterative/dvc/pull/8485

Full Changelog: https://github.com/iterative/dvc/compare/2.32.0...2.32.1
Source code(tar.gz)
Source code(zip)
dvc-2.32.1-1.x86_64.rpm(131.08 MB)
dvc-2.32.1.exe(60.35 MB)
dvc-2.32.1.pkg(101.38 MB)
dvc_2.32.1_amd64.deb(131.95 MB)
2.32.0(Oct 29, 2022)
What's Changed

Use celery status as the exp show status by @karajan1001 in https://github.com/iterative/dvc/pull/8369

index: support multiple targets within output in IndexView by @efiop in https://github.com/iterative/dvc/pull/8471

auto solve corrupted rwlock info by @karajan1001 in https://github.com/iterative/dvc/pull/8469

Full Changelog: https://github.com/iterative/dvc/compare/2.31.0...2.32.0
Source code(tar.gz)
Source code(zip)
dvc-2.32.0-1.x86_64.rpm(131.08 MB)
dvc-2.32.0.exe(60.35 MB)
dvc-2.32.0.pkg(101.38 MB)
dvc_2.32.0_amd64.deb(131.95 MB)
2.31.0(Oct 21, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

🔨 Maintenance

build(deps): Bump dvc-data from 0.20.0 to 0.22.0 (#8459) @dependabot

Thanks again to @dependabot, @dependabot[bot] and @efiop for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.31.0-1.x86_64.rpm(126.81 MB)
dvc-2.31.0.exe(60.14 MB)
dvc-2.31.0.pkg(101.28 MB)
dvc_2.31.0_amd64.deb(127.60 MB)
2.30.1(Oct 21, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

import-url: use dvc-data index.save() for fetching imports (#8249) @pmrowla

[pre-commit.ci] pre-commit autoupdate (#8441) @pre-commit-ci

plots: allow definition of plots section as list (#8412) @dtrifiro

config: ssh: Add passphrase, ask_passphrase (#8143) @daavoo

index: add IndexView, brancher: support index (#8407) @pmrowla

ignore: walk: support detail=True (#8398) @efiop

🚀 New Features and Enhancements

exp show: Preserve full branch and tag names. (#8425) @daavoo

🏇 Optimizations

exp show: Use batch call on scm.describe (#8453) @karajan1001

🐛 Bug Fixes

Give lock acquiring more time in concurrency situation. (#8436) @karajan1001

exp show: Preserve full branch and tag names. (#8425) @daavoo

🔨 Maintenance

build(deps): Bump dvc-task from 0.1.3 to 0.1.4 (#8447) @dependabot

deps: bump dvc-data to 0.20.0 (#8443) @pmrowla

build(deps-dev): Bump pylint from 2.15.2 to 2.15.4 (#8424) @dependabot

build(deps): Bump dvc-data from 0.18.0 to 0.19.0 (#8442) @dependabot

build(deps-dev): Bump pytest-mock from 3.9.0 to 3.10.0 (#8402) @dependabot

deps: bump dvc-data to 0.18.0 (#8432) @pmrowla

[pre-commit.ci] pre-commit autoupdate (#8422) @pre-commit-ci

Thanks again to @daavoo, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @karajan1001, @pmrowla, @pre-commit-ci, @pre-commit-ci[bot] and @skshetry for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.30.1-1.x86_64.rpm(126.80 MB)
dvc-2.30.1.exe(60.14 MB)
dvc-2.30.1.pkg(101.28 MB)
dvc_2.30.1_amd64.deb(127.60 MB)
2.30.0(Oct 10, 2022)
What's Changed

exp show :Add --hide-queued and --hide-failed flag by @karajan1001 in https://github.com/iterative/dvc/pull/8318

build(deps): Bump dvc-render from 0.0.11 to 0.0.12 by @dependabot in https://github.com/iterative/dvc/pull/8401

Refactor dvc get-url by @rlamy in https://github.com/iterative/dvc/pull/8410

deps: bump dvc-data to 0.17.1 by @pmrowla in https://github.com/iterative/dvc/pull/8416

Full Changelog: https://github.com/iterative/dvc/compare/2.29.0...2.30.0
Source code(tar.gz)
Source code(zip)
dvc-2.30.0-1.x86_64.rpm(126.83 MB)
dvc-2.30.0.exe(60.18 MB)
dvc-2.30.0.pkg(101.29 MB)
dvc_2.30.0_amd64.deb(127.66 MB)
2.29.0(Oct 4, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

hydra: Fix append and remove sweeps. (#8381) @daavoo

Create basic version of dvc ls-url command (#8299) @rlamy

deps: bump dvc-data to 0.14.0 (#8389) @efiop

dvcfs tests: copy pytest param instead of in-place update (#8388) @skshetry

Rename dvc.testing.test_*.py (#8386) @rlamy

cli: remove foreach-group from help text (#8383) @dberenbaum

[pre-commit.ci] pre-commit autoupdate (#8367) @pre-commit-ci

🐛 Bug Fixes

repo: fix crash while collecting stages with symlinks (#8364) @dtrifiro

import: fix rev lock and pull with --no-download (#8341) @dtrifiro

config: wrap UnicodeDecodeErrors on load (#8380) @pmrowla

🔨 Maintenance

logger: init logging config before colorama (#8395) @pmrowla

build(deps-dev): Bump mypy from 0.981 to 0.982 (#8393) @dependabot

build(deps-dev): Bump mypy from 0.971 to 0.981 (#8368) @dependabot

config: wrap UnicodeDecodeErrors on load (#8380) @pmrowla

build(deps-dev): Bump pytest-mock from 3.8.2 to 3.9.0 (#8378) @dependabot

build(deps-dev): Bump pytest-cov from 3.0.0 to 4.0.0 (#8379) @dependabot

build(deps): Bump dvc-task from 0.1.2 to 0.1.3 (#8377) @dependabot

Thanks again to @daavoo, @dberenbaum, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @pmrowla, @pre-commit-ci, @pre-commit-ci[bot], @rlamy and @skshetry for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.29.0-1.x86_64.rpm(126.81 MB)
dvc-2.29.0.exe(60.15 MB)
dvc-2.29.0.pkg(101.28 MB)
dvc_2.29.0_amd64.deb(127.63 MB)
2.28.0(Sep 27, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

vscode: support flexible plots (#8282) @pared

pull: hide glob option (#8337) @dberenbaum

deps: bump codespell (#8199) @pared

import/import-url: ignore outs when using --no-download (#8343) @dtrifiro

fixed link to "get started: pipelines" docs (#8340) @MartinoMensio

🚀 New Features and Enhancements

exp show: sync state between queue and exp show table (#8158) @karajan1001

merge-driver: support removes and changes (#8360) @dberenbaum

🐛 Bug Fixes

cloud-versioning: better handling for directories (#8362) @efiop

Solve the on_diverged function not executed error. (#8351) @karajan1001

hydra: Fix sweeps on Defaults List. (#8308) @daavoo

🔨 Maintenance

build(deps): Bump dvc-data from 0.10.1 to 0.12.0 (#8346) @dependabot

deps: bump dvc-http to 2.27.2 (#8333) @dtrifiro

deps: bump dvc-data to 0.10.1 (#8330) @pmrowla

Thanks again to @MartinoMensio, @daavoo, @dberenbaum, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @karajan1001, @pared, @pmrowla and @skshetry for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.28.0-1.x86_64.rpm(125.80 MB)
dvc-2.28.0.exe(59.45 MB)
dvc-2.28.0.pkg(100.24 MB)
dvc_2.28.0_amd64.deb(126.62 MB)
2.27.2(Sep 19, 2022)
What's Changed

push: use update_pipeline=False for cloud versioning by @efiop in https://github.com/iterative/dvc/pull/8324

Full Changelog: https://github.com/iterative/dvc/compare/2.27.1...2.27.2
Source code(tar.gz)
Source code(zip)
dvc-2.27.2-1.x86_64.rpm(125.51 MB)
dvc-2.27.2.exe(59.39 MB)
dvc-2.27.2.pkg(99.99 MB)
dvc_2.27.2_amd64.deb(126.32 MB)
2.27.1(Sep 19, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

deps: bump dvc-s3 to 2.20.1

Source code(tar.gz)
Source code(zip)
dvc-2.27.1-1.x86_64.rpm(125.51 MB)
dvc-2.27.1.exe(59.38 MB)
dvc-2.27.1.pkg(99.99 MB)
dvc_2.27.1_amd64.deb(126.31 MB)
2.27.0(Sep 19, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

remove mergify (#8319) @skshetry

deps: add testing group for dvc.testing requirements (#8314) @dtrifiro

deps: bump dvc-data to 0.10.0 (#8313) @efiop

dvcfs: rename DvcFileSystem to DVCFileSystem (#8307) @skshetry

dvcfs: prevent opening file object in write mode (#8306) @skshetry

🔨 Maintenance

analytics: use iterative-telemetry for user_id lookup (#8317) @efiop

deps: bump dvc-azure to 2.20.4 (#8305) @pmrowla

build(deps): Bump dvc-render from 0.0.10 to 0.0.11 (#8303) @dependabot

Thanks again to @dependabot, @dependabot[bot], @dtrifiro, @efiop, @pmrowla and @skshetry for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.27.0-1.x86_64.rpm(125.51 MB)
dvc-2.27.0.exe(59.39 MB)
dvc-2.27.0.pkg(99.99 MB)
dvc_2.27.0_amd64.deb(126.31 MB)
2.26.2(Sep 15, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

loader: don't forget to set meta.isdir (#8302) @efiop

Thanks again to @efiop for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.26.2-1.x86_64.rpm(125.52 MB)
dvc-2.26.2.exe(59.39 MB)
dvc-2.26.2.pkg(100.01 MB)
dvc_2.26.2_amd64.deb(126.31 MB)
2.26.1(Sep 15, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

index: use tree obj if using inline files (#8301) @efiop

try to split testsuites into two (#8297) @skshetry

🐛 Bug Fixes

dvcfs: add fsspec-compliance tests (#8296) @skshetry

Thanks again to @efiop and @skshetry for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.26.1-1.x86_64.rpm(125.52 MB)
dvc-2.26.1.exe(59.39 MB)
dvc-2.26.1.pkg(100.01 MB)
dvc_2.26.1_amd64.deb(126.32 MB)
2.26.0(Sep 15, 2022)
What's Changed

expose dvcfs in dvc.api and add to fsspec's registry by @skshetry in https://github.com/iterative/dvc/pull/8287

import-url: pass fs_config down from imp_url to get_cloud_fs by @dtrifiro in https://github.com/iterative/dvc/pull/8286

deps: remove unused mock dep by @dtrifiro in https://github.com/iterative/dvc/pull/8290

dvcfs: default open to binary mode by @skshetry in https://github.com/iterative/dvc/pull/8295

worktree push/fetch: support dirs by @pmrowla in https://github.com/iterative/dvc/pull/8273

schema: add strict schema validation for top-level plots by @skshetry in https://github.com/iterative/dvc/pull/8289

Full Changelog: https://github.com/iterative/dvc/compare/2.25.0...2.26.0
Source code(tar.gz)
Source code(zip)
dvc-2.26.0-1.x86_64.rpm(125.52 MB)
dvc-2.26.0.exe(59.39 MB)
dvc-2.26.0.pkg(100.01 MB)
dvc_2.26.0_amd64.deb(126.32 MB)
2.25.0(Sep 13, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

dvc: cloud versioning POC (#8264) @efiop

typo in setup config causing versioning errors in poetry (#8229) @jlhbaseball15

tests: set celery ping_task_timeout 3x the default (#8221) @skshetry

🚀 New Features and Enhancements

exp run: Support hydra basic sweeper. (#8187) @daavoo

data ls: new command to show metadata with outputs (#8252) @skshetry

dvcfs: remove config (#8276) @skshetry

add metadata support to dvc.yaml (#8251) @skshetry

Add support for custom metadata (#8250) @skshetry

add metadata fields: label, type to data (#8232) @skshetry

add support for foreach target (#8210) @skshetry

output: support version ID (#8223) @pmrowla

add support for git credentials helpers (#6586, scmrepo#138) @dtrifiro

🏇 Optimizations

Optimise dvc ls -R (#8241) @rlamy

🐛 Bug Fixes

dvc.yaml: preserve outputs' desc on rewrites/updates to the stage (#8247) @skshetry

import: fix broken auth https://github.com/iterative/dvc/issues/7898

🔨 Maintenance

build(deps-dev): Bump pylint from 2.15.0 to 2.15.2 (#8268) @dependabot

build(deps): Bump scmrepo from 0.1.0 to 0.1.1 (#8269) @dependabot

build(deps): Bump dvc-render from 0.0.9 to 0.0.10 (#8254) @dependabot

build(deps): Bump dvc-data from 0.4.0 to 0.5.3 (#8237) @dependabot

build(deps-dev): Bump pytest from 7.1.2 to 7.1.3 (#8239) @dependabot

build(deps-dev): Bump dvc-azure from 2.20.0 to 2.20.2 (#8240) @dependabot

deps: bump dvc-data to 0.7.1 (#8266) @efiop

deps: bump dvc-data to 0.6.3 (#8257) @efiop

deps: bump dvc-azure and dvc-s3 to 2.20.0 (#8224) @efiop

Thanks again to @daavoo, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @jlhbaseball15, @pmrowla, @rlamy and @skshetry for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.25.0-1.x86_64.rpm(125.50 MB)
dvc-2.25.0.exe(59.39 MB)
dvc-2.25.0.pkg(99.98 MB)
dvc_2.25.0_amd64.deb(126.31 MB)
2.24.0(Sep 1, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

deps: bump dvc-data to 0.4.0 by @efiop in https://github.com/iterative/dvc/pull/8219 and https://github.com/iterative/dvc/pull/8213

🚀 New Features and Enhancements

exp run: Support composing and dumping Hydra config. by @daavoo in https://github.com/iterative/dvc/pull/8093

🐛 Bug Fixes

data status: fix path for committed changes in Windows by @skshetry in https://github.com/iterative/dvc/pull/8220

Full Changelog: https://github.com/iterative/dvc/compare/2.23.0...2.24.0
Source code(tar.gz)
Source code(zip)
dvc-2.24.0-1.x86_64.rpm(125.46 MB)
dvc-2.24.0.exe(59.34 MB)
dvc-2.24.0.pkg(99.92 MB)
dvc_2.24.0_amd64.deb(126.26 MB)
2.23.0(Aug 30, 2022)
Refer to https://dvc.org/doc/install for installation instructions.

Changes

output.get_obj: catch ObjectCorruptedError (#8212) @skshetry

data status: fix quoting on command hints for untracked files (#8211) @skshetry

fetch: do not checkout partial imports (#8205) @dtrifiro

data status: update hints to include fetch and checkout (#8209) @dberenbaum

test on 3.11 (#8196) @skshetry

plots: support dirs in top level definitions (#8159) @pared

repo: Handle no commits for exp show and plots diff. (#8177) @daavoo

plots templates: change ui to not dump to file (#8129) @dberenbaum

🚀 New Features and Enhancements

info: Include subprojects. (#8201) @daavoo

data status: remove --withdirs, show unknowns in CLI (#8189) @skshetry

api: Add details forparams_show stages syntax. (#8167) @daavoo

Better error message when specifying file as target for remove (#8044) @alexmojaki

Thanks again to @alexmojaki, @daavoo, @dberenbaum, @dtrifiro, @pared, @pre-commit-ci[bot] and @skshetry for the contributions! 🎉
Source code(tar.gz)
Source code(zip)
dvc-2.23.0-1.x86_64.rpm(125.02 MB)
dvc-2.23.0.exe(59.08 MB)
dvc-2.23.0.pkg(99.51 MB)
dvc_2.23.0_amd64.deb(125.81 MB)