Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

IDSIA

Last update: Jan 2, 2023

Related tags

Miscellaneous python infrastructure machine-learning mongodb reproducible-research reproducible-science reproducibility

Overview

Sacred

Every experiment is sacred

Every experiment is great

If an experiment is wasted

God gets quite irate

Sacred is a tool to help you configure, organize, log and reproduce experiments. It is designed to do all the tedious overhead work that you need to do around your actual experiment in order to:

keep track of all the parameters of your experiment
easily run your experiment for different settings
save configurations for individual runs in a database
reproduce your results

Sacred achieves this through the following main mechanisms:

Config Scopes A very convenient way of the local variables in a function to define the parameters your experiment uses.
Config Injection: You can access all parameters of your configuration from every function. They are automatically injected by name.
Command-line interface: You get a powerful command-line interface for each experiment that you can use to change parameters and run different variants.
Observers: Sacred provides Observers that log all kinds of information about your experiment, its dependencies, the configuration you used, the machine it is run on, and of course the result. These can be saved to a MongoDB, for easy access later.
Automatic seeding helps controlling the randomness in your experiments, such that the results remain reproducible.

Example

Script to train an SVM on the iris dataset

The same script as a Sacred experiment

from numpy.random import permutation
from sklearn import svm, datasets





C = 1.0
gamma = 0.7



iris = datasets.load_iris()
perm = permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]
clf = svm.SVC(C, 'rbf', gamma=gamma)
clf.fit(iris.data[:90],
        iris.target[:90])
print(clf.score(iris.data[90:],
                iris.target[90:]))

from numpy.random import permutation
from sklearn import svm, datasets
from sacred import Experiment
ex = Experiment('iris_rbf_svm')

@ex.config
def cfg():
  C = 1.0
  gamma = 0.7

@ex.automain
def run(C, gamma):
  iris = datasets.load_iris()
  per = permutation(iris.target.size)
  iris.data = iris.data[per]
  iris.target = iris.target[per]
  clf = svm.SVC(C, 'rbf', gamma=gamma)
  clf.fit(iris.data[:90],
          iris.target[:90])
  return clf.score(iris.data[90:],
                   iris.target[90:])

Documentation

The documentation is hosted at ReadTheDocs.

Installing

You can directly install it from the Python Package Index with pip:

pip install sacred

Or if you want to do it manually you can checkout the current version from git and install it yourself:

git clone https://github.com/IDSIA/sacred.git

cd sacred

python setup.py install

You might want to also install the numpy and the pymongo packages. They are optional dependencies but they offer some cool features:

pip install numpy, pymongo

Tests

The tests for sacred use the pytest package. You can execute them by running pytest in the sacred directory like this:

pytest

There is also a config file for tox so you can automatically run the tests for various python versions like this:

tox

Update pyptest version

If you update or change the pytest version, the following files need to be changed:

dev-requirements.txt
tox.ini
test/test_utils.py
setup.py

Contributing

If you find a bug, have a feature request or want to discuss something general you are welcome to open an issue. If you have a specific question related to the usage of sacred, please ask a question on StackOverflow under the python-sacred tag. We value documentation a lot. If you find something that should be included in the documentation please document it or let us know whats missing. If you are using Sacred in one of your projects and want to share your code with others, put your repo in the Projects using Sacred _ list. Pull requests are highly welcome!

Frontends

At this point there are three frontends to the database entries created by sacred (that I'm aware of). They are developed externally as separate projects.

Omniboard

Omniboard is a web dashboard that helps in visualizing the experiments and metrics / logs collected by sacred. Omniboard is written with React, Node.js, Express and Bootstrap.

Incense

Incense is a Python library to retrieve runs stored in a MongoDB and interactively display metrics and artifacts in Jupyter notebooks.

Sacredboard

Sacredboard is a web-based dashboard interface to the sacred runs stored in a MongoDB.

Neptune

Neptune is a web service that lets you visualize, organize and compare your experiment runs. Once things are logged to Neptune you can share it with others, add comments and even access objects via experiment API:

In order to log your runs to Neptune, all you need to do is add an observer:

from neptunecontrib.monitoring.sacred import NeptuneObserver
ex.observers.append(NeptuneObserver(api_token='YOUR_API_TOKEN',
                                    project_name='USER_NAME/PROJECT_NAME'))

For more info, check the neptune-contrib library.

SacredBrowser

SacredBrowser is a PyQt4 application to browse the MongoDB entries created by sacred experiments. Features include custom queries, sorting of the results, access to the stored source-code, and many more. No installation is required and it can connect to a local database or over the network.

Prophet

Prophet is an early prototype of a webinterface to the MongoDB entries created by sacred experiments, that is discontinued. It requires you to run RestHeart to access the database.

Related Projects

Sumatra

Sumatra is a tool for managing and tracking projects based on numerical

simulation and/or analysis, with the aim of supporting reproducible research.

It can be thought of as an automated electronic lab notebook for

computational projects.

Sumatra takes a different approach by providing commandline tools to initialize a project and then run arbitrary code (not just python). It tracks information about all runs in a SQL database and even provides a nice browser tool. It integrates less tightly with the code to be run, which makes it easily applicable to non-python experiments. But that also means it requires more setup for each experiment and configuration needs to be done using files. Use this project if you need to run non-python experiments, or are ok with the additional setup/configuration overhead.

Future Gadget Laboratory

FGLab is a machine learning dashboard, designed to make prototyping

experiments easier. Experiment details and results are sent to a database,

which allows analytics to be performed after their completion. The server

is FGLab, and the clients are FGMachines.

Similar to Sumatra, FGLab is an external tool that can keep track of runs from any program. Projects are configured via a JSON schema and the program needs to accept these configurations via command-line options. FGLab also takes the role of a basic scheduler by distributing runs over several machines.

CDE

By tracing system calls during program execution CDE creates a snapshot of all used files and libraries to guarantee the ability to reproduce any unix program execution. It only solves reproducibility, but it does so thoroughly.

License

This project is released under the terms of the MIT license.

Citing Sacred

K. Greff, A. Klein, M. Chovanec, F. Hutter, and J. Schmidhuber, ‘The Sacred Infrastructure for Computational Research’, in Proceedings of the 15th Python in Science Conference (SciPy 2017), Austin, Texas, 2017, pp. 49–56.

Comments

too much magic - what would it take to have an object oriented interface?
Hello sacred authors! Great work - I'm just starting to embrace sacred and am feeling very excited about the potential.

So far I've been finding it just a bit too magical for my tastes though. I think that the magic has it's place in reducing the boilerplate to almost zero for a quick experiment - but I've been finding it to be somewhat of an activation barrier in getting started for someone not familiar with the library.

The issue is that sacred breaks completely out of the standard python programming paradigm, which makes it difficult to reason about code behavior.

Again, I do think that this has it's place for quick scripting - but for production workflows that prioritize extensibility and maintainability over line count, I think an object-oriented interface would be more appropriate.

My reasoning is that:

the object hierarchy (which I feel pretty confused about right now) would be explicitly defined in the code, it's clear exactly an experiment is made up of

easier to integrate with existing code. I know how classes behave, how inheritance works, how to work them into module structure. It's unclear to me how to effectively use the magic interface to scared within a larger project and not as a one-off thing.

much easier to get less engineer-minded lab members to used sacred when they don't have to learn a whole new paradigm. They just have to fill in a template

Any estimates how challenging this would be? Recommendations on where to start?
stale
opened by rueberger 28

Sacred Workflows

In an attempt to structure our discussion I suggest to use this issue to collect a wishlist of how we would like to use Sacred from a birds-eye perspective. I suggest that we edit this issue to reflect the evolving consensus that (hopefully) emerges from the discussion below. To get things started I can think of 3 basic workflows, that I would love for sacred to support. Maybe this is also a good place to think about how to integrate stages and superexperiments.

Interactive (Jupyter Notebook)

Manually control the stages of the experiment / run in an interactive environment. Most suitable for exploration and low complexity experiments. Something like:

# -----------------------------------------------------------
# initialization
ex = Experiment('my_jupyter_experiment')
ex.add_observer(FilestorageObserver('tmp'))
# -----------------------------------------------------------
# Config and Functions
cfg = Configuration()
cfg.learn_rate = 0.01
cfg.hidden_sizes = [100, 100]
cfg.batch_size = 32

@ex.capture
def get_dataset(batch_size):
    ....
# -----------------------------------------------------------
# run experiment
ex.start()   # finalizes config, starts observers
data = get_dataset()  # call functions 
for i in range(1000):
    # do something 
    ex.log_metric('loss', loss)  # log metrics, artifacts, etc.

ex.stop(result=final_loss)
# -----------------------------------------------------------

Scripting

Using a main script that contains most of the experiment and is run from the commandline. This is the current main workflow, most suitable for low to medium complexity experiments.

ex = Experiment('my_experiment_script')

@ex.config
def config(cfg):
    cfg.learn_rate = 0.01
    ...

@ex.capture
def get_dataset(batch_size):
    ....

@ex.automain  # define a main function which automatically starts and stops the experiment
def main():
    ....   # do stuff, log metrics, etc.
    return final_loss

Object Oriented

This is a long-standing feature request #193. Define an experiment as a class to improve modularity (and support frameworks like ray.tune). Should cater to medium to high complexity experiments. Very tentative API sketch:

class MyExperiment(Experiment):
    def __init__(self, config=None):   # context-config to deal with updates and nesting
         super().__init__(config)
         self.learn_rate = 0.001   # using self to store config improves IDE support
         ...

    def get_dataset(self):  # no capturing because self gives access to config anyways
        return ...

    @main   # mark main function / commands 
    def main_function(self):
         ...   # do stuff
         return final_loss

ex = MyExperiment(config=get_commandline_updates())
ex.run()

stale API Discussion

opened by Qwlouse 27

WIP: Error messages
This is a WIP pullrequest that addresses the error messages as described in #239. I finally got time to prepare a pullrequest for this.

The code is currently very ugly and could more be seen as a proof of concept that those things can work.

There are lists of things that are already done and that need to be done below. Any suggestions for further improvement of the error messages (or the code, since some part of it is not well structured and untested) are welcome.

Done

Use iterate_ingredients for gathering commands and named configs

This causes gather_commands and gather_named_configs to raise a CircularDependencyError instead of a RecursionError, which makes much clearer what is causing the error. In addition, any future gather_something functions that may be implemented can overwrite one method and the error handling is done in iterate_ingredients, and the path filtering for experiments is done there.

Track Ingredients that cause circular dependencies

The CircularDependencyError is caught in iterate_ingredients and the current ingredient is added to a list CircularDependencyError.__ingredients__ to keep track of which ingrediens cuased the circular depenceny.

An example error:

Traceback (most recent call last): File "error_messages.py", line 24, in <module> @ex.automain File ".../sacred/experiment.py", line 141, in automain self.run_commandline() File ".../sacred/experiment.py", line 248, in run_commandline short_usage, usage, internal_usage = self.get_usage() File ".../sacred/experiment.py", line 173, in get_usage commands = OrderedDict(self.gather_commands()) File ".../sacred/experiment.py", line 394, in _gather for ingredient, _ in self.traverse_ingredients(): File ".../sacred/ingredient.py", line 370, in traverse_ingredients raise e File ".../sacred/ingredient.py", line 363, in traverse_ingredients for ingred, depth in ingredient.traverse_ingredients(): File ".../sacred/ingredient.py", line 370, in traverse_ingredients raise e File ".../sacred/ingredient.py", line 363, in traverse_ingredients for ingred, depth in ingredient.traverse_ingredients(): File ".../sacred/ingredient.py", line 370, in traverse_ingredients raise e File ".../sacred/ingredient.py", line 363, in traverse_ingredients for ingred, depth in ingredient.traverse_ingredients(): File ".../sacred/ingredient.py", line 357, in traverse_ingredients raise CircularDependencyError(ingredients=[self]) sacred.exception.CircularDependencyError: ing->ing2->ing

Track sources of configuration entries

This code is still very ugly, but it allows to track the sources of configuration values. This works up to different resolutions:

for a ConfigScope, we can find the wrapped function and get the place of definition of this function (file + line of the signature line)

for a configuration file we can find the file that defines the configuration values. It would be very difficult to get the line of the config value inside of the file.

for a dict config, we can use inspect.stack to find the line in which the dict configuration value was added.

for configuration defined in the command line, we can say that it was defined in the command line options

See the InvalidConfigError for examples.

Add a baseclass SacredError for future Excpetions that is pretty printed in experiment.run_commandline

The init definition looks like this:

def __init__(self, *args, print_traceback=True, filter_traceback=None, print_usage=False): # ...

It provides the following additional arguments (that are handled in experiment.run_commandline):

print_traceback: if True, traceback is printed according to filter_traceback. If False, no traceback is printed (except for the Exception itself)

filter_traceback: If True, the traceback is filtered (WITHOUT sacred internals), if False, it is not filtered and if None, it falls back to the previous behaviour (filter if not raised within sacred)

print_usage: The short usage is printed when this is set to True.

Add an InvalidConfigError that can be raised in user code

Added an InvalidConfigError that prints the conflicting configuration values.

Example:

ex = Experiment() @ex.config def config(): config1 = 123 config2 = dict(a=234) @ex.automain def main(config1, config2): if not type(config1) == type(config2['a']): raise InvalidConfigError('Must have same type', conflicting_configs=('config1', 'config2.a'))

$ python error_messages.py with config1=abcde WARNING - root - Changed type of config entry "config1" from int to str WARNING - error_messages - No observers have been added to this run INFO - error_messages - Running command 'main' INFO - error_messages - Started ERROR - error_messages - Failed after 0:00:00! Traceback (most recent calls WITHOUT Sacred internals): File ".../wrapt/wrappers.py", line 523, in __call__ args, kwargs) File "error_messages.py", line 27, in main raise InvalidConfigError('Must have same type', conflicting_configs=('config1', 'config2.a')) sacred.exception.InvalidConfigError: Must have same type Conflicting configuration values: config1=abcde defined in command line config "config1=abcde" config2.a=234 defined in "error_messages.py:20"

MissingConfigError

Prints missing configuration values. Prints the filtered stack trace by default, so that the function call that is missing values can be found. It also prints the name of the ingredient that captured the function and the file in which the captured function is defined.

Example error:

Traceback (most recent calls WITHOUT Sacred internals): File .../wrapt/wrappers.py", line 523, in __call__ args, kwargs) sacred.exception.MissingConfigError: main is missing value(s) for ['config3'] Function that caused the exception: <function main at 0x0F7A0780> captured by the experiment "error_messages" at "error_messages.py:24"

NamedConfigNotFoundError

Raise a NamedConfigNotFoundError instead of KeyError, and don't print traceback.

TODO

print list of available named configs

give suggestion based on levenshtein distance

ConfigAddedError

Raise a ConfigAddedError when a config value is added that is not used anywhere. This is a sublcass of ConfigError and prints the source where the new configuration value is defined:

Traceback (most recent call last): sacred.utils.ConfigAddedError: Added new config entry "unused" that is not used anywhere Conflicting configuration values: unused=3 defined in command line config "unused=3" Did you mean "config1" instead of "unused"

TODO

print suggestions based on levenshtein distance

TODO

print suggestions for ConfigAddedError

(colored exception output?)

make source tracking optional in SETTINGS

improve resolution of source tracking (line of config file, line in a config scope maybe using inspect.stack)

CommandNotFoundError (?)

Error when parameter is not present for config scope

tests

stale
opened by thequilo 23
Reworking the Configuration Process
This is an issue that has been on my mind for a long time. The configuration process in sacred is very powerful, but it also has gotten rather complex. I would like to rewrite it from the ground up and fix a few long-standing issues along the way. Warning: this is going to be a lengthy post with many thoughts and half-baked ideas. But I would really appreciate your input for this.

Current state

Configuration is divided into (by priority)

config updates (highest)

named configurations

default configuration

Usually you would evaluate the lowest priority first, and overwrite values with the higher priorities. But in Sacred, we support the fancy config scopes which allow the default config to interact with the higher priority values. E.g.:

@ex.config def config(): a = 10 b = 'low' if a < 100 else 'high'

To support this we evaluate the highest priority first and consider them fixed during the evaluation of the lower priorities. In the example above you could pass with a=200 from the commandline and get b='high' in the final config.

I really enjoy this flexibility and would like to keep it. But the implementation has become rather messy, and hard to maintain. It also still lacks several useful features.

Limitations

No support for non-str keys

The keys of dictionaries must be valid python-identifiers, otherwise they cannot be set from the commandline. This has come up, for example, with sklearn which uses int keys to specify class_weights for classifiers.

Setting list elements

Currently only dictionaries can be modified from the commandline. This can be worked around by storing everything as dictionaries in the config and converting them to lists in the experiment code. But this is clearly ugly.

Meta information

Sacred keeps track of meta information for config entries such as documentation, if the type of entry has changed, and if an entry was added. This process is currently very convoluted and error prone. This is an important feature that can help tremendously with catching errors in configuration of an experiment so it should be rock-solid.

Ingredients

The Ingredient system meant to support modularization of experiments. The idea was to separate out reusable parts such as dataset loading, that have their own configuration. That way they could be used from multiple experiments. But in practice the current system has fallen short. The main problem is that currently ingredients have to be completely self-contained, and cannot be configured from the within the experiment that uses them. Often what is needed, is a way to pass parameters to an ingredient from the experiment. They are also implicitly added to the configuration, which is confusing, and does not allow using an ingredient multiple times from the same experiment.

Internal values

Sacred automatically adds a seed to the configuration that can be accessed and used. But that process again is rather intransparent and confusing. We should have a well-defined behavior for such internal config values. That would also be good for adding the _id or a default temp-dir for storing artifacts.

Wishlist

More explicit config scopes

This would remove some of the magic and make the process more explicit:

@ex.config def config(cfg): cfg.a = 10 # explicitly use a cfg object to define entries cfg.b = 'low' if cfg.a < 100 else 'high'

Dotted access internally

For convenience, the final configuration should allow dotted access

@ex.capture def foo(network): print(network.layers[2].size)

Commandline updates

Updates should mostly follow the python syntax and support the general form <VAR>=<VALUE>, with dotted access for nesting if the keys are python-identifiers: foo='bar' optimizer.learn_rate=0.1 network.layer_sizes=[100,200,100]

Square brackets can be used to access any key that is a valid and hashable python literal: foo[True]=17 layers[2].size=50 people['Jane Doe'].height=176

Advanced nice-to-have features could include:

slicing support: sizes[:3]=[1, 2, 3]

recursive descent: network...act_func='tanh' i.e. all size entries that are decendants of network.

wildcard: parent.*=7 matches all children of parent

regex support: net[/.*_layer/].size=100

These might be very tricky to get right, so I would postpone them. But I want to keep these usecases in mind for later.

Ingredients

Ideally Ingredients would be explicitly instantiated in the config. That way they could take parameters, be used multiple times, and even be included conditionally:

@ex.config def config(cfg): cfg.normalize=True cfg.test_on_other_dataset=False cfg.train_dataset = MyDatasetIngredient(normalize=cfg.normalize) if cfg.test_on_other_dataset: cfg.test_dataset = MyOtherDatasetIngredient() else: cfg.test_dataset = MyDatasetIngredient()

Well defined priorities / stages

The internal resolution of priorities should be very clear, transparent and extendable. To keep the support for config scopes, the entries still need to be collected starting from the highest priority. But having a general system for evaluating partial configurations and updating them incrementally would go a long way in simplifying this mess.

Further Steps

I have several ideas on how to implement this and a few prototypes already. But this post is already too long, and getting your feedback on the problems and goals would be valuable before moving on anyways. So what do you think? Do you agree with the raised points? Anything else that comes to mind?
stale Discussion
opened by Qwlouse 20
Notification

Added implementation of notification threw a notificator.

Ingredient now have a notificator attribute and his added with the add_notificator method. (documented)

This allow to push a notification (like a Slack message, a Facebook messenger message or else) when the script is failling or when the job is done.

I've worked on implementation of notificator with my notif package, but was not sure if adding dependancies was the appropriated approach. So one can simply use the notif package or any other class which implement send_notification_error and send_notification method.

The motivation of this PR is to have notification when a script is failling or when the script is done.

Since I was not sure about your commenting approach, i've place comment where code have been added. Feel free to remove it since I think code is pretty clear without the comment.

Also, I added a code example on how to use the notificator. But since to fully work you need a valid webhook url, the example is not working (one cann't execute the script). Also, I was not sure where (and if) it would be appropriated to mention it in the documentation. Any ideas ?

Testing

I have added two test to verify if the notificator are called when needed and present. Since the notificator are throwing message, i've mocked the notificator.

opened by davebulaval 18

Sacred is not compatible with TensorFlow 1.14.0

TensorFlow 1.14.0 was released couple of days ago and it isn't unfortunately compatible with sacred. Tensorflow package in 1.14.0 has __spec__ property set to None, which results in exception in pkgutil.find_loader when one imprts sacred.

I'm not sure whether this is a bug/feature on tensorflow, however the fact that it breaks sacred is unfortunate.

$ docker run -it tensorflow/tensorflow:1.14.0-py3 bash
root@d70a18c72a0b:/# pip install sacred
Successfully installed colorama-0.4.1 docopt-0.6.2 jsonpickle-0.9.6 munch-2.3.2 packaging-19.0 py-cpuinfo-5.0.0 pyparsing-2.4.0 sacred-0.7.5
root@d70a18c72a0b:/# python3
Python 3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.14.0'
>>> import sacred
Traceback (most recent call last):
  File "/usr/lib/python3.6/pkgutil.py", line 490, in find_loader
    spec = importlib.util.find_spec(fullname)
  File "/usr/lib/python3.6/importlib/util.py", line 102, in find_spec
    raise ValueError('{}.__spec__ is None'.format(name))
ValueError: tensorflow.__spec__ is None

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/sacred/__init__.py", line 13, in <module>
    from sacred.experiment import Experiment
  File "/usr/local/lib/python3.6/dist-packages/sacred/experiment.py", line 13, in <module>
    from sacred.arg_parser import format_usage, get_config_updates
  File "/usr/local/lib/python3.6/dist-packages/sacred/arg_parser.py", line 16, in <module>
    from sacred.serializer import restore
  File "/usr/local/lib/python3.6/dist-packages/sacred/serializer.py", line 8, in <module>
    from sacred import optional as opt
  File "/usr/local/lib/python3.6/dist-packages/sacred/optional.py", line 40, in <module>
    has_tensorflow = modules_exist("tensorflow")
  File "/usr/local/lib/python3.6/dist-packages/sacred/utils.py", line 656, in modules_exist
    return all(module_exists(m) for m in modnames)
  File "/usr/local/lib/python3.6/dist-packages/sacred/utils.py", line 656, in <genexpr>
    return all(module_exists(m) for m in modnames)
  File "/usr/local/lib/python3.6/dist-packages/sacred/utils.py", line 652, in module_exists
    return pkgutil.find_loader(modname) is not None
  File "/usr/lib/python3.6/pkgutil.py", line 496, in find_loader
    raise ImportError(msg.format(fullname, type(ex), ex)) from ex
ImportError: Error while finding loader for 'tensorflow' (<class 'ValueError'>: tensorflow.__spec__ is None)

stale

opened by JonasAmrich 18

Add pickle support to ReadOnly{Dict,List}

Closes #499.

Initially I tried to add pickle support using __getstate__ and __setstate. However, pickle, seems to special-case built-in container types: my __setstate__ method never even got called in ReadOnlyDict. Accordingly I decided to switch to a proxy style, where we are wrapping a dict/list object. This is similar to the MappingProxyType, which we could probably replace ReadOnlyDict with once Python 2.7 support is dropped (it's only available in Python 3.3+).
stale

opened by AdamGleave 17
Keep track of imported modules through code
Hi, In my current project I'm importing modules through the function importlib.import_module. Unfortunately Sacred does not keep track of the imported modules in this way. Is there a way in which I can add those files to sacred?

Example:

def instantiate_model(path): """Instantiates a model given its fullname as a python.package.name""" return getattr( importlib.import_module("project.models." + ".".join(path.split(".")[:-1])), path.split(".")[-1], )()
stale
opened by EmanueleGhelfi 17
With command change str to int

This is command description: root@350ab4117e66:/code# python main.py print_config with eval_end="2018-11-25"
WARNING - root - Changed type of config entry "eval_end" from str to int
INFO - percolata_experiment - Running command 'print_config'
INFO - percolata_experiment - Started
Configuration (modified, added, typechanged, doc):
algo_config_file = None
algo_type = 'prophet'
baseline_expid = None
cus_name = None
data_type = 'fill'
desc = None
emails = '[email protected]'
eval_end = 1982
eval_start = '2018-11-23'
executor = None
exp_identity = 'percolata_experiment'
forecast_type = 'expt'
location_id = 1249
mode = 'pred'
pred_end = '2019-02-26'
pred_start = '2019-01-22'
seed = 191843317
time_res = 'slot'
train_start = None
weight = 'weight.tar.pth'

I don't understand why Sacred will always throw " WARNING - root - Changed type of config entry "eval_end" from str to int" message.

opened by Chenguoqing2008 15
Allow config scopes with type annotations.
Allows the usage of type hints in config scopes. Closes #818.

Notes:

A new test is added for this behaviour.

Instead of fixing the regexp, it uses ast and tokenize modules. With ast we find the first line with code in the function body. As comments are removed in the AST representation, we then generate the tokens of the function until that line is reached, in order to keep the previous lines that only have comments and whitespace.
opened by vnmabus 14
API change proposal: getting rid of the "create" method for observers

This change is backward compatible (given that a user didn't use the undocumented __init__ method of observers)

I would propose that the recommended way of calling an observer would be by calling the constructor __init__(), not the create() method. We can be backward compatible by forwarding the arguments of the create method to the __init__().

I understand from the codebase that __init__ was used internally, to recreate observers from existing metrics and other existing data (in the middle of the experiment). I propose that we move this functionality to a new method called create_from().

Calling the class constructor will feel natural and avoid documentation lookups. It would be better that the most convenient method (the constructor) is used externally than internally.

We can drop a warning in the create function and keep it around indefinitely.

If all lights are greens from the maintainers I can do the change and update the docs.

opened by gabrieldemarmiesse 14

AssertionError while running a notebook through ipython

Setting interactive=True doesn't work when the notebook as a script through ipython.

$ ipython notebook.ipynb

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[1], line 1
----> 1 ex = Experiment("image_classification", interactive=True)
      2 ex.observers.append(NeptuneObserver(run=neptune_run))

File ~\miniconda3\envs\py38\lib\site-packages\sacred\experiment.py:119, in Experiment.__init__(self, name, ingredients, interactive, base_dir, additional_host_info, additional_cli_options, save_git_info)
    117     elif name.endswith(".pyc"):
    118         name = name[:-4]
--> 119 super().__init__(
    120     path=name,
    121     ingredients=ingredients,
    122     interactive=interactive,
    123     base_dir=base_dir,
    124     _caller_globals=caller_globals,
    125     save_git_info=save_git_info,
    126 )
    127 self.default_command = None
    128 self.command(print_config, unobserved=True)

File ~\miniconda3\envs\py38\lib\site-packages\sacred\ingredient.py:75, in Ingredient.__init__(self, path, ingredients, interactive, _caller_globals, base_dir, save_git_info)
     69 self.save_git_info = save_git_info
     70 self.doc = _caller_globals.get("__doc__", "")
     71 (
     72     self.mainfile,
     73     self.sources,
     74     self.dependencies,
---> 75 ) = gather_sources_and_dependencies(
     76     _caller_globals, save_git_info, self.base_dir
     77 )
     78 if self.mainfile is None and not interactive:
     79     raise RuntimeError(
     80         "Defining an experiment in interactive mode! "
     81         "The sourcecode cannot be stored and the "
     82         "experiment won't be reproducible. If you still"
     83         " want to run it pass interactive=True"
     84     )

File ~\miniconda3\envs\py38\lib\site-packages\sacred\dependencies.py:725, in gather_sources_and_dependencies(globs, save_git_info, base_dir)
    723 def gather_sources_and_dependencies(globs, save_git_info, base_dir=None):
    724     """Scan the given globals for modules and return them as dependencies."""
--> 725     experiment_path, main = get_main_file(globs, save_git_info)
    727     base_dir = base_dir or experiment_path
    729     gather_sources = source_discovery_strategies[SETTINGS["DISCOVER_SOURCES"]]

File ~\miniconda3\envs\py38\lib\site-packages\sacred\dependencies.py:596, in get_main_file(globs, save_git_info)
    594     main = None
    595 else:
--> 596     main = Source.create(globs.get("__file__"), save_git_info)
    461 return Source(main_file, get_digest(main_file), repo, commit, is_dirty)

File ~\miniconda3\envs\py38\lib\site-packages\sacred\dependencies.py:382, in get_py_file_if_possible(pyc_name)
    380 if pyc_name.endswith((".py", ".so", ".pyd")):
    381     return pyc_name
--> 382 assert pyc_name.endswith(".pyc")
    383 non_compiled_file = pyc_name[:-1]
    384 if os.path.exists(non_compiled_file):

Environment details:

Windows11
python3.8.15
sacred==0.8.2

opened by SiddhantSadangi 0

MD5 hash for gridfs is deprecated

https://github.com/IDSIA/sacred/blob/f6191d173ab9e3db2021ae32e5ae9cbd882de75c/sacred/observers/mongo.py#L445

Looks like md5 hash doesn't work as earlier in gridfs, and all files will be added without checking md5 (file in file = self.fs.find_one({"filename": abs_path, "md5": md5}) always None), I fixed it like _id = self.fs.put(f, filename=abs_path, md5=md5) but not sure this is the correct one

opened by Kwentar 0
Autorelease workflow draft

Adds a github actions workflow that triggers automatically on a version change and uploads the new version to pypi. (based on https://github.com/marketplace/actions/pypi-github-auto-release )

Not sure how to best test this.

opened by Qwlouse 7
Error when importing FileStorageObserver
I tried to re-run some code with the newest version of sacred that used to work with a sacred version from late 2019 and now get the following error. The problem already seems to occur when trying to import the FileStorageObserver. What would be the best fix for this issue?

I am using python version 3.6 on macOS Mojave 10.14.6.

Traceback (most recent call last): File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/importlib_metadata-4.12.0-py3.6.egg/importlib_metadata/_compat.py", line 9, in from typing import Protocol ImportError: cannot import name 'Protocol'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run_code.py", line 5, in from sacred.observers import FileStorageObserver File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/sacred/init.py", line 11, in from sacred.experiment import Experiment File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/sacred/experiment.py", line 12, in from sacred.arg_parser import format_usage, get_config_updates File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/sacred/arg_parser.py", line 14, in from sacred.serializer import restore File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/sacred/serializer.py", line 1, in import jsonpickle File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/jsonpickle-2.2.0-py3.6.egg/jsonpickle/init.py", line 81, in from .version import version # noqa: F401 File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/jsonpickle-2.2.0-py3.6.egg/jsonpickle/version.py", line 5, in import importlib_metadata as metadata File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/importlib_metadata-4.12.0-py3.6.egg/importlib_metadata/init.py", line 17, in from . import _adapters, _meta File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/importlib_metadata-4.12.0-py3.6.egg/importlib_metadata/_meta.py", line 1, in from ._compat import Protocol File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/importlib_metadata-4.12.0-py3.6.egg/importlib_metadata/_compat.py", line 12, in from typing_extensions import Protocol # type: ignore File "/Users/k47h4/anaconda3/envs/topdown_plasticity/lib/python3.6/site-packages/typing_extensions-4.3.0-py3.6.egg/typing_extensions.py", line 160, in class _FinalForm(typing._SpecialForm, _root=True): AttributeError: module 'typing' has no attribute '_SpecialForm'

The entire code can be found in this repository: https://github.com/k47h4/interneuron_circuits_plasticity Here is a code snippet from run_code.py:

from sacred.observers import FileStorageObserver def run_in_thread(values): from Spiking_model import ex ex.observers.append(FileStorageObserver.create('Spiking_model')) ex.run('run_network') values1 = np.array([0]) n_threads = len(values1) pool = multiprocessing.Pool(n_threads) pool.map(run_in_thread, values1)

Thank you!
opened by k47h4 1
Add `pint.Qunatity` units support

Adds a units field to linearize_metrics output per discussion in #880.

I took a slightly different approach. Instead of adding units to ScalarMetricLogEntry, I added "units" to the linearized output and filled it in based on whether or not value is of type pint.Quantity. Not sure if it would be better to add units=None to log_scalar_metric and do the pint support in the background, or to do it as I've done. On the one hand, it would remove the hard dependency on pint and would be a little easier for users. On the other hand, users wouldn't have full access to pints features, so they couldn't define their own units.

I went ahead and added in the unit conversion as pint makes it quite easy to do. Units will be converted to the unit of the first log entry. If the units cannot be converted, a custom exception is raised. If you submit some entries with units and some without, it assumes that the entries without units use the same units as the entries with units. Might be better to throw an exception there as a user really shouldn't be doing that.

While working on this feature, I added some type hints where they were missing. Not my finest type hints, but it's better than nothing!

I have an update adding metric support (with units) to SqlObserver ready, but it relies on this PR.

opened by Gracecr 11

Releases(0.8.3)

0.8.3(Mar 28, 2022)
A minor release with many small improvements and support for Python 3.10.

Feature: Support for the new numpy random API (np.random.Generator); deprecate old np.random.RandomState for np 1.19+ (#779, thanks @jnphilipp)

Feature: Add py.typed file for typecheckers like mypy (#849, thanks @neophnx)

Feature: Validate sacred settings (#774)

Feature: Update CLI options: Change run ID from command line (#798, thanks @jnphilipp)

Feature: Log named configs and config updates (#823)

Feature: Options to save sources and copy resources in FileStorageObserver (#806, thanks @patrick-kidger)

Feature: Support for NVIDIA Multi-Instance GPU (#865, thanks @j3soon)

Bugfix: Updated testcases to py3.6+; updated dependencies (e.g., tinydb 4+, pytest 6.2.1, pymongo 4.0) (#799, #819, #821, thanks a lot @jnphilipp)

Bugfix: Fixes for handling symlinks (#791, thanks @MaxSchambach)

Bugfix: Fix docker example (#829, thanks @ahallermed)

Doc: Some fixes and update of the documentation (#778, #792, #793, #797, #804, #842, #856, thanks @daliasen @aaronsnoswell @schmitts @Blaizzy)

Source code(tar.gz)
Source code(zip)
0.8.2(Nov 26, 2020)
Minor bugfix release that resolves some bugs for Python 3.8+ and issues with the read-only container types.

Feature: Added support for pickling and YAML serialization to the read-only containers (#775, #737)

Feature: Added git integration to SqlObserver (#741)

Feature: Added support for a collection prefix in MongoObserver (#704)

Bugfix: Fix print_config command for Python 3.8 (#719)

Bugfix: Fix save_config command (#765)

Bugfix: Named config updates are now distributed correctly during the configuration creation process (#769, #777)

Bugfix: Parsing of the nvidia_smi output now also works with non-Unicode (e.g., Chinese) characters in process names (#776)

Bugfix: Fix type annotations of MongoObserver (#762)

Bugfix: Terminate tee on timeout. This is a workaround that prevents program crashes caused by output capturing (#740)

Bugfix: Improve parsing of config scopes (#699, #764)

Bugfix: Fix error tracking of ConfigErrors when raised in a config scope (#733)

Bugfix: Made git import optional (#724)

Source code(tar.gz)
Source code(zip)
0.8.0(Oct 14, 2019)
Major release with several breaking changes.

API change: Dropped support for Python 2

API change: Gathering of git information gathering is now enabled by default #595

API change: Switched constructor from Observer.create(...) to Observer(...) for all observers.

API change: Changed the interface for collecting custom host-information #569

API change: Changed interface for defining CLI options. #572

Feature: Added new S3 file observer #542

Feature: added started_text option to TelegramObserver #494

Feature: added copy/deepcopy support to read-only containers #500

Bugfix: FileStorage Observer is more reliable under parallel execution #503

Bugfix: FileStorageObserver now raises an error when an artifact would overwrite an important file #647

Bugfix: fixed inconsistent config nesting behavior #409 #505

Bugfix: Several fixes for tensorflow integration

Bugfix: Fixed crash due to missing brand-key on some machines # 512

Internal: Migrated CI server to Azure

Internal: Added pre-commit hooks for pep 8 checks and python black for automated code formatting

Internal: Started using pathlib.Path instead of os.path in many places

Source code(tar.gz)
Source code(zip)
0.7.5(Jun 20, 2019)
The last release to support Python 2.7.

Feature: major improvements to error reporting (thanks @thequilo)

Feature: added print_named_configs command

Feature: added option to add metadata to artifacts (thanks @jarnoRFB)

Feature: content type detection for artifacts (thanks @jarnoRFB)

Feature: automatic seeding for pytorch (thanks @srossi93)

Feature: add proxy support to telegram observer (thanks @brickerino)

Feature: made MongoObserver fail dump dir configurable (thanks @jarnoRFB)

Feature: added queue-based observer that better handles unreliable connections (thanks @jarnoRFB)

Bugfix: some fixes to stdout capturing

Bugfix: FileStorageObserver now creates directories only when starting a run (#329; thanks @thomasjpfan)

Bugfix: Fixed config_hooks (#326; thanks @thomasjpfan)

Bugfix: Fixed a crash when overwriting non-dict config entries with dicts (#325; thanks @thomasjpfan)

Bugfix: fixed problem with running in conda environment (#341)

Bugfix: numpy aware config change detection (#344)

Bugfix: allow dependencies to be compiled libraries (thanks @jnphilipp)

Bugfix: output colorization now works on 256 and 16 color terminals (thanks @bosr)

Bugfix: fixed problem with tinydb observer logging (#327; thanks @michalgregor)

Bugfix: ignore folders that have the same name as a named_config (thanks @boeddeker)

Bugfix: setup no longer overwrites pre-configured root logger (thanks @thequilo)

Bugfix: compatibility with tensorflow 2.0 (thanks @tarik, @gabrieldemarmiesse)

Bugfix: fixed exception when no tee is available for stdout capturing (thanks @greg-farquhar)

Bugfix: fixed concurrency issue with FileStorageObserver (thanks @dekuenstle)

Source code(tar.gz)
Source code(zip)
0.7.4(Jun 12, 2018)
Minor bugfix release that solves some issues with the interaction of ingredients and named configs.

Bugfix: fixed problem with postgres backend of SQLObserver (thanks @bensternlieb)

Bugfix: fixed a problem with the interaction of ingredients and named configs

Feature: added metrics logging to the FileStorageObserver (thanks @ummavi)

Source code(tar.gz)
Source code(zip)
0.7.3(May 6, 2018)
Major bugfix release that fixes several critical issues including: experiments that sometimes didn't exit, racing conditions in the FileStorage and MongoObservers and several stdout-capturing problems.

Feature: support custom experiment base directory (thanks @anibali)

Feature: added option to pass existing MongoClient to MongoObserver (thanks @rueberger)

Feature: allow setting the config docstring from named configs

Feature: added py-cpuinfo as fallback for gathering CPU information (thanks @serv-inc)

Feature: added support for _log argument in config function

Bugfix: stacktrace filtering now correctly handles chained exceptions (thanks @kamo-naoyuki)

Bugfix: resolved issue with stdout capturing sometimes loosing the last few lines

Bugfix: fixed the overwrite option of MongoObserver

Bugfix: fixed a problem with the heartbeat sometimes not ending

Bugfix: fixed an error with running in interactive mode

Bugfix: added a check for non-unique ingredient paths (thanks @boeddeker)

Bugfix: fixed several problems with UTF-8 decoding (thanks @LukasDrude, @wjp)

Bugfix: fixed nesting structure of _config (thanks @boeddeker)

Bugfix: fixed crash when using git integration with empty repository (thanks @ramon-oliveira)

Bugfix: fixed a crash with first run using sqlite backend

Bugfix: fixed several problem with the tests (thanks @thomasjpfan)

Bugfix: fixed racing condition in FileStorageObserver (thanks @boeddeker)

Bugfix: fixed problem with overwriting named configs of ingredients (thanks @pimdh)

Bugfix: removed deprecated call to inspect.getargspec()

Bugfix: fixed problem with empty dictionaries disappearing from config updates and named configs (thanks @TomVeniat)

Bugfix: fixed problem with commandline parsing when program name contained spaces

Bugfix: loglevel option is now taken into account for config related warnings

Bugfix: properly handle numpy types in metrics logging

Source code(tar.gz)
Source code(zip)
0.7.2(May 6, 2018)
Minor features release:

API Change: added host_info to queued_event

Feature: improved and configurable dependency discovery system

Feature: improved and configurable source-file discovery system

Feature: better error messages for missing or misspelled commands

Feature: -m flag now supports passing an id for a run to overwrite

Feature: allow captured functions to be called outside of a run (thanks @berleon)

Bugfix: fixed issue with telegram imports (thanks @millawell)

Source code(tar.gz)
Source code(zip)
0.7.1(May 6, 2018)
Bugfixes and improved Tensorflow support.

Refactor: lazy importing of many optional dependencies

Feature: added metrics API for adding live monitoring information to the MongoDB

Feature: added integration with tensorflow for automatic capturing of LogWriter paths

Feature: set seed of tensorflow if it is imported

Feature: named_configs can now affect the config of ingredients

Bugfix: failed runs now return with exit code 1 by default

Bugfix: fixed a problem with UTF-8 symbols in stdout

Bugfix: fixed a threading issue with the SQLObserver

Bugfix: fixed a problem with consecutive ids in the SQLObserver

Bugfix: heartbeat events now also serialize the intermediate results

Bugfix: reapeatedly calling run from python with an option for adding an observer, no longer duplicates observers

Bugfix: fixed a problem where **kwargs of captured functions might be modified

Bugfix: fixed an encoding problem with the FileStorageObserver

Bugfix: fixed an issue where determining the version of some packages would crash

Bugfix: fixed handling of relative filepaths in the SQLObserver and the TinyDBObserver

Source code(tar.gz)
Source code(zip)
0.7.0(May 7, 2017)
Major feature release that breaks backwards compatibility in a few cases.

Feature: host info now contains information about NVIDIA GPUs (if available)

Feature: git integration: sacred now collects info about the git repository of the experiment (if available and if gitpython is installed)

Feature: new --enforce-clean flag that cancels a run if the git repository is dirty

Feature: added new TinyDbObserver and TinyDbReader (thanks to @MrKriss)

Feature: added new SqlObserver

Feature: added new FileStorageObserver

Feature: added new SlackObserver

Feature: added new TelegramObserver (thanks to @black-puppydog)

Feature: added save_config command

Feature: added queue flag to just queue a run instead of executing it

Feature: added TimeoutInterrupt to signal that a run timed out

Feature: experiments can now be run in Jupyter notebook, but will fail with an error by default, which can be deactivated using interactive=True

Feature: allow to pass unparsed commandline string to ex.run_commandline.

Feature: improved stdout/stderr capturing: it now also collects non-python outputs and logging.

Feature: observers now share the id of a run and it is available during runtime as run._id.

Feature: new --print_config flag to always print config first

Feature: added sacred.SETTINGS as a place to configure some of the behaviour

Feature: ConfigScopes now extract docstrings and line comments and display them when calling print_config

Feature: observers are now run in order of priority (settable)

Feature: new --name=NAME option to set the name of experiment for this run

Feature: the heartbeat event now stores an intermediate result (if set).

Feature: ENVIRONMENT variables can be captured as part of host info.

Feature: sped up the applying_lines_and_backfeeds stdout filter. (thanks to @remss)

Feature: adding resources by name (thanks to @d4nst)

API Change: all times are now in UTC

API Change: significantly changed the mongoDB layout

API Change: MongoObserver and FileStorageObserver now use consecutive integers as _id

API Change: the name passed to Experiment is now optional and defaults to the name of the file in which it was instantiated. (The name is still required for interactive mode)

API Change: Artifacts can now be named, and are stored by the observers under that name.

API Change: Experiment.run_command is deprecated in favor of run, which now also takes a command_name parameter.

API Change: Experiment.run now takes an options argument to add commandline-options also from python.

API Change: Experiment.get_experiment_info() now returns source-names as relative paths and includes a separate base_dir entry

Dependencies: Migrated from six to future, to avoid conflicts with old preinstalled versions of six.

Bugfix: fixed a problem when trying to set the loglevel to DEBUG

Bugfix: type conversions from None to some other type are now correctly ignored

Bugfix: fixed a problem with stdout capturing breaking tools that access certain attributes of sys.stdout or sys.stderr.

Bugfix: @main, @automain, @command and @capture now support functions with Python3 style annotations.

Bugfix: fixed a problem with config-docs from ingredients not being propagated

Bugfix: fixed setting seed to 0 being ignored

Source code(tar.gz)
Source code(zip)
0.6.10(Aug 8, 2016)
A minor release to incorporate a few bugfixes and minor features before the upcoming big 0.7 release

Bugfix: fixed a problem when trying to set the loglevel to DEBUG

Bugfix: fixed a random crash of the heartbeat thread (see #101).

Feature: added --force/-f option to disable errors and warnings concerning suspicious changes. (thanks to Yannic Kilcher)

Feature: experiments can now be run in Jupyter notebook, but will fail with an error by default, which can be deactivated using interactive=True

Feature: added support for adding a captured out filter, and a filter that and applies backspaces and linefeeds before saving like a terminal would. (thanks to Kevin McGuinness)

Source code(tar.gz)
Source code(zip)
0.6.8(Jan 13, 2016)
0.6.8 (2016-01-14)

Feature: Added automatic conversion of pandas datastructures in the custom info dict to json-format in the MongoObserver.

Feature: Fail if a new config entry is added but it is not used anywhere

Feature: Added a warning if no observers were added to the experiment. Added also an unobserved keyword to commands and a --unobserved commandline option to silence that warning

Feature: Split the debug flag -d into two flags: -d now only disables stacktrace filtering, while -D adds post-mortem debugging.

API change: renamed named_configs_to_use kwarg in ex.run_command method to named_configs

API change: changed the automatic conversion of numpy arrays in the MongoObserver from pickle to human readable nested lists.

Bugfix: Fixed a problem with debugging experiments.

Bugfix: Fixed a problem with numpy datatypes in the configuration

Bugfix: More helpful error messages when using return or yield in a config scope

Bugfix: Be more helpful when using -m/--mongo_db and pymongo is not installed

Source code(tar.gz)
Source code(zip)