A little logger for machine learning research

Reinforcement Learning Working Group

Last update: Dec 3, 2022

Related tags

Data Visualization dowel

Overview

dowel

dowel is a little logger for machine learning research.

Installation

pip install dowel

Usage

import dowel
from dowel import logger, tabular

logger.add_output(dowel.StdOutput())
logger.add_output(dowel.TensorBoardOutput('tensorboard_logdir'))

logger.log('Starting up...')
for i in range(1000):
    logger.push_prefix('itr {}'.format(i))
    logger.log('Running training step')

    tabular.record('itr', i)
    tabular.record('loss', 100.0 / (2 + i))
    logger.log(tabular)

    logger.pop_prefix()
    logger.dump_all()

logger.remove_all()

Comments

Add custom x axes to TensorBoard
This PR adds support for custom x-axes to the TensorBoard scalar plot. The axis names are specified in the tensorboard output constructor.

class TensorBoardOutput(LogOutput): def __init__(self, log_dir, x_axes=None, flush_secs=120, histogram_samples=1e3):

When x_axes is None, it falls back to use iterations as the x-axis. If any x_axis is not present in the scalar tabular. A warning will be logged to the console. If all x_axes are not present, it falls back to use iteration as the x-axis.

Screenshots of an experiment with Epoch and TotalEnvSteps as x-axes.

I will open a separate PR in garage repo to set TotalEnvSteps as the default x-axis.
opened by naeioi 6
Add Python 3.8 build
The current issue #45 is caused by inconsistent keys in tabularInput with csvOutput headers. To fix this:

Read the existing log file

Update the csv Dictwriter with the new header using union

Union of tabularInput's and csvOutput's header is used to make sure all keys from both instances are captured

Write the same log file with new key headers and old data. If the value of new key is missing, the cell is left blank.

The cons of this solution:

The csv log file is read and write to update the inconsistent key headers. It may be time-consuming to handle a large number of csv log files because of the heavy I/O. To ensure the I/O is as efficient as possible, all file reading and writing is done in RAM and file is closed upon used.

A better approach can be:

Expand the header on-the-go as a new key(s) is encountered. Write the log file with the new header exactly once.

rlworkgroup#45 @avnishn @zequnyu
opened by irisliucy 3
Ignore packages pre-cached by travis

Travis keeps several packages like numpy pre-installed to speed up builds but this sometimes leads to wrong version being picked up by pip.

This change would ignore the pre-installed versions

opened by gitanshu 2
Fix tests, add Tensorflow 2 compat

Unfortunately, fixing the tests involves monkey patching unittest, since it has bug. That bug is fixed in CPython PR #4800, which has gone unmerged for over a year.

opened by krzentner 2
Starter Project for Directed Research in Robotics
This attempts to allow a more robust handling of dynamic fileds in module dowel. This is implemented by the following logic:

The CSV headers are now ordered

When a new field is encountered, it is appended to the ordered list of headers

The output files gets flushed and re-read to allow header replacement

The new records with new fields get written

In order to accompanish this, these following changes to internal data structures and logic are added:

csv._fieldnames becomes list() instead of set()

The output file is now opened in w+ mode to allow both read and write access

Unit tests were added for the following cases:

Adding a new field at the end of the list of fields

Adding a new field in the middle of the list of fields

Sincerely
opened by late-in-autumn 1
Robust handling of inconsistent TabularInput keys
Currently, CsvOutput emits a warning if the keys of a TabularInput change after the first call to logger.log(TabularInput). A new key not seen before will be ignored and an old key not presented will be left blank. In other words, CsvOutput conservatively handles dynamic fieldnames.

This behaviour of CsvOutput makes it tricky to log performance of Multi- and Meta- ML algorithms, where there are usually per-task fields but not every task is presented in every iteration, resulting in missing of logs for some tasks.

The desired behaviour to handle inconsistent keys should be

When a new key is encountered

Expand header with the new key.

Expand old rows with empty cells for the new key.

If the value of any key is missing, leave the cell blank.
opened by naeioi 1
Rewrite automatic versioning

The previous automatic versioning script was flawed. It produced the correct package version for documentation builds and building PyPI distributions, but produced an incorrect version when you run setup.py from the downloaded package. Unfortunately, Python environment managers (e.g. Pipenv, conda) resolve package version by evaluating setup.py, not using the PyPI version.

This PR makes version generation simpler by reading the version string from a simple file. Automatic versioning from tags is handled by clobbering the version file from within the CI, rather than looking for a CI environment variable on every usage.

opened by ryanjulian 1
Remove Snapshotter

This was accidentally included during the import from rlworkgroup/garage.

The Snapshotter is not part of the Logger API, so it doesn't really belong in this package.

opened by ryanjulian 1
Move unit tests to tests/dowel

This PR moves unit tests modules to tests/dowel. This makes it so that the unit tests paths for a module are easy to predict. For instance, the tests for the module dowel.csv_output will now live at tests.dowel.test_csv_output rather than tests.test_csv_output.

opened by ryanjulian 1
Add Python 3.8 build
Inconsistent header keys are handled. When a new key is introduced, the previous data is augmented line by line. In the logger, if data is a TabularInput instance, then it has its values emptied, with its keys remaining so that there is no data bleed when a key is omitted in the future. Tensorboard incompatibility is an issue as tensorboard does not accept the empty character (or string) as a legal numpy scalar.

Nine tests have been provided to cover the various circumstances:

No change in keys

Single increase in keys with consistent future usage

Multiple increase in keys with consistent future usage

Single increase in keys with inconsistent future usage

Multiple increase in keys with inconsistent future usage

Overlapping increase in keys with immediate inconsistency

Static keys - tensorboard incompatibility test

Dynamic keys - tensorboard incompatibility test

Empty tabulation test

Note: I have left the comment structure to be consistent with the preexisting code.
opened by koverman47 0
Add Python 3.8 build

Fix for #45

Used DictReader to read in old file values and created new DictWriter object to rewrite all records. Old records without values for these new keys will be null as required.

Also, apologies for the unnecessary mentions to this issue earlier!

@avnishn @zequnyu

opened by dxlin17 0
Robust handling of inconsistent TabularInput keys
Introduction

Dowel is a tool that the garage Team uses for logging results from our various Reinforcement learning experiments.

Dowel can be used to log different types of data such as floats or strings. The logs can be logged to stdout (the console), CSV files, and Tensorboard.

You can check out an example of how Dowel is used here. In fact, almost all parts of the Dowel API are used in this example.

The problem

After statistics such as loss have been logged, and a call to logger.dump_all() is made for the first time, new tabular data can’t be written to a CSV output. This is because currently data cannot be inconsistently logged to CSV, meaning that on every single call to dump_all, the same logger keys must appear. Data that is inconsistently logged will not appear in the CSV output. This is a design flaw that we have been able to work around but affects our workflows.

Your goal is to solve the problem as well as introduce tests into our testing framework in order to verify your solution.

Some General Instructions

Fork Dowel and install all necessary dependencies.

Take a look at this toy example which when run exposes the bug and the accompanying issue mentioned above.

When you have finished writing your solution and tests, upload a PR onto your fork, not onto the upstream repository.

When you are done email us back with the link to your pull request.

Follow the rules of the contributing.md.

If you have any questions, open an issue in your fork, and tag @avnishn and @haydenshively. Our preferred mode of communication on any questions that you have is through github issues and pull requests, as this is how the Garage team communicates generally. For this reason, we won’t respond to any direct emails with regards to help with your project. We will however respond to any other questions that you have via email (interview scheduling, etc).

Best of luck, and let us know if there are any issues as early on as possible
opened by avnishn 0

Mention SSH setup in CONTRIBUTING.md

When I tried to run the following commands from the "Git recipes" in CONTRIBUTING.md, I got error messages:

git remote add rlworkgroup [email protected]:rlworkgroup/dowel.git

git reset --hard master rlworkgroup/master

However, the following would work:

git remote add rlworkgroup https://github.com/rlworkgroup/dowel.git

git checkout master
git fetch rlworkgroup
git reset --hard rlworkgroup/master

Should CONTRIBUTING.md be updated?

opened by GuanyangLuo 1

Logging Numpy arrays, Torch Tensors and Tensorflow Tensors

Hi,

Thank you for this nice a simple tool for logging machine learning research. I often encounter situations where I would like to save multi-dimensional Numpy arrays. For example, the observation at each time-step in a reinforcement learning experiment.

It would be nice to have an output logger that supports Numpy arrays, Pytorch Tensors and Tensorflow Tensors.

I have written a simple output logger, NpzOutput, that writes Numpy arrays to a .npz file using Numpys savez functions. It is not optimal (no incremental saving), but thought I share it in case somebody is interested.

opened by BartKeulen 2
dowel causes the main process to hang forever, if it contains a TensorboardOutput when the process is closing

This is because it attempts to close the underlying TensorboardX writer in TensorboardOutput.__del__. However, global teardown of the python interpreter has already closed the thread used by TensorboardX.

opened by krzentner 0
Add dots to alternating tabular lines

Tables with keys of varying lengths are hard to read, since some of the key names end up far from their values. This change adds a sequence of dots on all odd lines, so that lines are easier to match up with their keys.

opened by krzentner 5

Owner

Reinforcement Learning Working Group

Coalition of researchers which develop open source reinforcement learning research software

GitHub

A little word cloud generator in Python

Linux macOS Windows PyPI word_cloud A little word cloud generator in Python. Read more about it on the blog post or the website. The code is tested ag

7.9k Feb 17, 2021

Generate a roam research like Network Graph view from your Notion pages.

Notion Graph View Export Notion pages to a Roam Research like graph view.

214 Jan 7, 2023

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

7 Oct 5, 2022

Lime: Explaining the predictions of any machine learning classifier

lime This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predict

10.3k Dec 29, 2022

Debugging, monitoring and visualization for Python Machine Learning and Data Science

Welcome to TensorWatch TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Micr

3.3k Dec 27, 2022

Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

688 Jan 3, 2023

Visualizations for machine learning datasets

Introduction The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive

7.1k Jan 7, 2023

Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

520 Feb 17, 2021

Visualizations for machine learning datasets

Introduction The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive

6.5k Feb 17, 2021

3D Vision functions with end-to-end support for deep learning developers, written in Ivy.

Ivy vision focuses predominantly on 3D vision, with functions for camera geometry, image projections, co-ordinate frame transformations, forward warping, inverse warping, optical flow, depth triangulation, voxel grids, point clouds, signed distance functions, and others. Check out the docs for more info!

61 Dec 29, 2022

A collection of 100 Deep Learning images and visualizations

A collection of Deep Learning images and visualizations. The project has been developed by the AI Summer team and currently contains almost 100 images.

65 Sep 12, 2022

An interactive dashboard for visualisation, integration and classification of data using Active Learning.

AstronomicAL An interactive dashboard for visualisation, integration and classification of data using Active Learning. AstronomicAL is a human-in-the-

45 Nov 28, 2022

Learning Convolutional Neural Networks with Interactive Visualization.

CNN Explainer An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs) For more information,

6.3k Jan 1, 2023

Resources for teaching & learning practical data visualization with python.

Practical Data Visualization with Python Overview All views expressed on this site are my own and do not represent the opinions of any entity with whi

98 Sep 24, 2022

A Graph Learning library for Humans

A Graph Learning library for Humans These novel algorithms include but are not limited to: A graph construction and graph searching class can be found

1 Feb 8, 2022

Key Logger - Key Logger using Python

Key_Logger Key Logger using Python This is the basic Keylogger that i have made

2 Jan 15, 2022

Discord-Image-Logger - Discord Image Logger With Python

Discord-Image-Logger A exploit I found in discord. Working as of now. Explanatio

111 Dec 31, 2022

A little Python application to auto tag your photos with the power of machine learning.

Tag Machine A little Python application to auto tag your photos with the power of machine learning. Report a bug or request a feature Table of Content

14 Dec 21, 2022

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

Json Formatter for the standard python logger

Overview This library is provided to allow standard python logging to output log data as json objects. With JSON we can make our logs more readable by

1.4k Jan 4, 2023

A little logger for machine learning research

Related tags

Overview

dowel

Installation

Usage

Comments

Introduction

The problem

Some General Instructions

Owner

Reinforcement Learning Working Group

A little word cloud generator in Python

Generate a roam research like Network Graph view from your Notion pages.

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

Lime: Explaining the predictions of any machine learning classifier

Debugging, monitoring and visualization for Python Machine Learning and Data Science

Library for exploring and validating machine learning data

Visualizations for machine learning datasets

Library for exploring and validating machine learning data

Visualizations for machine learning datasets

3D Vision functions with end-to-end support for deep learning developers, written in Ivy.

A collection of 100 Deep Learning images and visualizations

An interactive dashboard for visualisation, integration and classification of data using Active Learning.

Learning Convolutional Neural Networks with Interactive Visualization.

Resources for teaching & learning practical data visualization with python.

A Graph Learning library for Humans

Key Logger - Key Logger using Python

Discord-Image-Logger - Discord Image Logger With Python

A little Python application to auto tag your photos with the power of machine learning.

Json Formatter for the standard python logger