Debugging, monitoring and visualization for Python Machine Learning and Data Science

Microsoft

Last update: Dec 27, 2022

Related tags

Data Visualization python debugging data-science machine-learning reinforcement-learning ai monitoring deep-learning jupyter jupyter-notebook debug machinelearning deeplearning debugging-tool saliency explainable-ai explainable-ml model-visualization

Overview

Welcome to TensorWatch

TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Microsoft Research. It works in Jupyter Notebook to show real-time visualizations of your machine learning training and perform several other key analysis tasks for your models and data.

TensorWatch is designed to be flexible and extensible so you can also build your own custom visualizations, UIs, and dashboards. Besides traditional "what-you-see-is-what-you-log" approach, it also has a unique capability to execute arbitrary queries against your live ML training process, return a stream as a result of the query and view this stream using your choice of a visualizer (we call this Lazy Logging Mode).

TensorWatch is under heavy development with a goal of providing a platform for debugging machine learning in one easy to use, extensible, and hackable package.

How to Get It

pip install tensorwatch

TensorWatch supports Python 3.x and is tested with PyTorch 0.4-1.x. Most features should also work with TensorFlow eager tensors. TensorWatch uses graphviz to create network diagrams and depending on your platform sometime you might need to manually install it.

How to Use It

Quick Start

Here's simple code that logs an integer and its square as a tuple every second to TensorWatch:

import tensorwatch as tw
import time

# streams will be stored in test.log file
w = tw.Watcher(filename='test.log')

# create a stream for logging
s = w.create_stream(name='metric1')

# generate Jupyter Notebook to view real-time streams
w.make_notebook()

for i in range(1000):
    # write x,y pair we want to log
    s.write((i, i*i))

    time.sleep(1)

When you run this code, you will notice a Jupyter Notebook file test.ipynb gets created in your script folder. From a command prompt type jupyter notebook and select test.ipynb. Choose Cell > Run all in the menu to see the real-time line graph as values get written in your script.

Here's the output you will see in Jupyter Notebook:

To dive deeper into the various other features, please see Tutorials and notebooks.

How does this work?

When you write to a TensorWatch stream, the values get serialized and sent to a TCP/IP socket as well as the file you specified. From Jupyter Notebook, we load the previously logged values from the file and then listen to that TCP/IP socket for any future values. The visualizer listens to the stream and renders the values as they arrive.

Ok, so that's a very simplified description. The TensorWatch architecture is actually much more powerful. Almost everything in TensorWatch is a stream. Files, sockets, consoles and even visualizers are streams themselves. A cool thing about TensorWatch streams is that they can listen to any other streams. This allows TensorWatch to create a data flow graph. This means that a visualizer can listen to many streams simultaneously, each of which could be a file, a socket or some other stream. You can recursively extend this to build arbitrary data flow graphs. TensorWatch decouples streams from how they get stored and how they get visualized.

Visualizations

In the above example, the line graph is used as the default visualization. However, TensorWatch supports many other diagram types including histograms, pie charts, scatter charts, bar charts and 3D versions of many of these plots. You can log your data, specify the chart type you want and let TensorWatch take care of the rest.

One of the significant strengths of TensorWatch is the ability to combine, compose, and create custom visualizations effortlessly. For example, you can choose to visualize an arbitrary number of streams in the same plot. Or you can visualize the same stream in many different plots simultaneously. Or you can place an arbitrary set of visualizations side-by-side. You can even create your own custom visualization widget simply by creating a new Python class, implementing a few methods.

Comparing Results of Multiple Runs

Each TensorWatch stream may contain a metric of your choice. By default, TensorWatch saves all streams in a single file, but you could also choose to save each stream in separate files or not to save them at all (for example, sending streams over sockets or into the console directly, zero hit to disk!). Later you can open these streams and direct them to one or more visualizations. This design allows you to quickly compare the results from your different experiments in your choice of visualizations easily.

Training within Jupyter Notebook

Often you might prefer to do data analysis, ML training, and testing - all from within Jupyter Notebook instead of from a separate script. TensorWatch can help you do sophisticated, real-time visualizations effortlessly from code that is run within a Jupyter Notebook end-to-end.

Lazy Logging Mode

A unique feature in TensorWatch is the ability to query the live running process, retrieve the result of this query as a stream and direct this stream to your preferred visualization(s). You don't need to log any data beforehand. We call this new way of debugging and visualization a lazy logging mode.

For example, as seen below, we visualize input and output image pairs, sampled randomly during the training of an autoencoder on a fruits dataset. These images were not logged beforehand in the script. Instead, the user sends query as a Python lambda expression which results in a stream of images that gets displayed in the Jupyter Notebook:

See Lazy Logging Tutorial.

Pre-Training and Post-Training Tasks

TensorWatch leverages several excellent libraries including hiddenlayer, torchstat, Visual Attribution to allow performing the usual debugging and analysis activities in one consistent package and interface.

For example, you can view the model graph with tensor shapes with a one-liner:

You can view statistics for different layers such as flops, number of parameters, etc:

See notebook.

You can view the dataset in a lower dimensional space using techniques such as t-SNE:

See notebook.

Prediction Explanations

We wish to provide various tools for explaining predictions to help debugging models. Currently, we offer several explainers for convolutional networks, including Lime. For example, the following highlights the areas that cause the Resnet50 model to make a prediction for class 240 for the Imagenet dataset:

See notebook.

Tutorials

Paper

More technical details are available in TensorWatch paper (EICS 2019 Conference). Please cite this as:

@inproceedings{tensorwatch2019eics,
  author    = {Shital Shah and Roland Fernandez and Steven M. Drucker},
  title     = {A system for real-time interactive analysis of deep learning training},
  booktitle = {Proceedings of the {ACM} {SIGCHI} Symposium on Engineering Interactive
               Computing Systems, {EICS} 2019, Valencia, Spain, June 18-21, 2019},
  pages     = {16:1--16:6},
  year      = {2019},
  crossref  = {DBLP:conf/eics/2019},
  url       = {https://arxiv.org/abs/2001.01215},
  doi       = {10.1145/3319499.3328231},
  timestamp = {Fri, 31 May 2019 08:40:31 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/eics/ShahFD19},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contribute

We would love your contributions, feedback, questions, and feature requests! Please file a Github issue or send us a pull request. Please review the Microsoft Code of Conduct and learn more.

Contact

Join the TensorWatch group on Facebook to stay up to date or ask any questions.

Credits

TensorWatch utilizes several open source libraries for many of its features. These include: hiddenlayer, torchstat, Visual-Attribution, pyzmq, receptivefield, nbformat. Please see install_requires section in setup.py for upto date list.

License

This project is released under the MIT License. Please review the License file for more details.

Comments

Issue with draw model

Hello,

I've just installed tensorwatch and try to reproduce the example :

import tensorwatch as tw
import torchvision.models

alexnet_model = torchvision.models.alexnet()
tw.draw_model(alexnet_model, [1, 3, 224, 224])

and unfortunately I'm gettind the following error :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/anaconda3/lib/python3.6/site-packages/tensorwatch/model_graph/hiddenlayer/pytorch_draw_model.py in _repr_svg_(self)
     11     def _repr_svg_(self):
     12         """Allows Jupyter notebook to render the graph automatically."""
---> 13         return self.dot._repr_svg_()
     14     def save(self, filename, format="png"):
     15         # self.dot.format = format

AttributeError: 'Dot' object has no attribute '_repr_svg_'

My versions are:

Python 3.6.9
IPython 7.9.0
Pytorch 1, 1.2 and 1.3.1 (I have tried with these three versions but nothing changes.)

I also try to run the given example on google colab and it raises the same error...

Maybe the error comes from my version of IPython ?

opened by mondeg0 7

Cannot install with pip because depends on package not hosted on PyPi
> pip install tensorwatch Collecting tensorwatch Using cached tensorwatch-0.9.0.tar.gz (187 kB) ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI. tensorwatch depends on pydot@ git+https://github.com/sytelus/[email protected]#egg=pydot

However, I can install tensorwatch version 0.8.10 without issue. I believe the problem was introduced by https://github.com/microsoft/tensorwatch/commit/353567f2071b4c7a5fae5afebeea787523c59762

I'm running ubuntu 16 with pip 20.0.2.
opened by jkerfs 5
pip install issue: "SyntaxError: invalid syntax"

Hello, it looks like a very cool tool!! I tried to install with pip and got the following issue:

Collecting tensorwatch Using cached https://files.pythonhosted.org/packages/ce/f2/4885c7f5ddf06224fc1443bb998464755e542c34f9966de4e686b9f1e43e/tensorwatch-0.8.4.tar.gz Requirement already satisfied: matplotlib in /media/ophir/DATA1/software/anaconda3/envs/pytorch/lib/python3.5/site-packages (from tensorwatch) (2.2.2) Requirement already satisfied: numpy in /media/ophir/DATA1/software/anaconda3/envs/pytorch/lib/python3.5/site-packages (from tensorwatch) (1.14.2) Requirement already satisfied: pyzmq in /media/ophir/DATA1/software/anaconda3/envs/pytorch/lib/python3.5/site-packages (from tensorwatch) (17.1.2) Requirement already satisfied: plotly in /media/ophir/DATA1/software/anaconda3/envs/pytorch/lib/python3.5/site-packages (from tensorwatch) (3.3.0) Collecting torchstat (from tensorwatch) Using cached https://files.pythonhosted.org/packages/bc/fe/f483b907ca80c90f189cd892bb2ce7b2c256010b30314bbec4fc17d1b5f1/torchstat-0.0.7-py3-none-any.whl Collecting receptivefield (from tensorwatch) Using cached https://files.pythonhosted.org/packages/cd/2a/a140221d151e228c5995e34f9c60d1ffd756f8672ccfbce8efe5da780671/receptivefield-0.4.0.tar.gz ERROR: Complete output from command python setup.py egg_info: ERROR: Traceback (most recent call last): File "", line 1, in File "/media/ophir/DATA1/ilyan/tmp/pip-install-gb2l8gtq/receptivefield/setup.py", line 13 download_url=f'https://github.com/fornaxai/receptivefield/archive/{VERSION}.tar.gz', ^ SyntaxError: invalid syntax ---------------------------------------- ERROR: Command "python setup.py egg_info" failed with error code 1 in /media/ophir/DATA1/ilyan/tmp/pip-install-gb2l8gtq/receptivefield/

does any one have a solution?

Thanks Ophir
install issue

opened by ophir91 3

pip package missing json file

I try to execute following cnn_pred_explain notebook on Colab. https://github.com/microsoft/tensorwatch/blob/master/notebooks/cnn_pred_explain.ipynb

But I failed to execute it, because following error appeared.

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-5-b08090dd95a6> in <module>()
     10 image_utils.show_image(img)
     11 probabilities = imagenet_utils.predict(model=model, images=[img])
---> 12 imagenet_utils.probabilities2classes(probabilities, topk=5)
     13 input_tensor = imagenet_utils.image2batch(img)
     14 prediction_tensor = pytorch_utils.int2tensor(239)

2 frames
/usr/local/lib/python3.6/dist-packages/tensorwatch/imagenet_utils.py in __init__(self, json_path)
     54         json_path = json_path or os.path.join(os.path.dirname(__file__), 'imagenet_class_index.json')
     55 
---> 56         with open(os.path.abspath(json_path), "r") as read_file:
     57             class_json = json.load(read_file)
     58             self._idx2label = [class_json[str(k)][1] for k in range(len(class_json))]

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/tensorwatch/imagenet_class_index.json'

In my guess, python pip package misses json file inclusion.

Reference A Simple Guide for Python Packaging https://medium.com/small-things-about-python/lets-talk-about-python-packaging-6d84b81f1bb5

opened by sakaia 3

Provide script files for notebooks?
Thank you! This project will help a lot of people in their research.

Looking at notebooks/mnist.ipynb, how are inputs expected, specifically for the topk visualisations?

I want to understand the block:

rand_pred = train.create_stream(expr="topk_all(l, \ batch_vals=lambda b: (b.batch.loss_all, (b.batch.input, b.batch.output), b.batch.target), \ out_f=image_class_outf, order='rnd')", event_name='batch', throttle=2)

The tutorials specify using a lambda expression, where b contains the watcher observed arguments. I'm not sure how to use topk_all and where l comes from.

Would it possible to see the training scripts for the notebooks provided in the repository?
question
opened by awwong1 3
Draw_model error
Dear Sir

Thanks for the excellent work !

However when I try this,

alex_model = models.alexnet() tw.draw_model(alex_model, [1, 3, 224, 224])

It returned 'Only output_size=[1, 1] is supported' error

python 3.7 pytorch 1.01
opened by Stephenfang51 2
Tutorial notebook: missing "summary.show()" line

In the "notebooks/simple_logging.ipynb notebook, the 2nd to last code cell creates a Visualizer called "summary", but it is never displayed. The line "summary.show()" should be added to fix this.

opened by rfernand2 2
pip install tensorwatch: missing ipywidgets and sklearn

After doing "pip install tensorwatch", I tried to run "sum_log.py". It required me to manually pip install "ipywidgets" and "sklearn" - these should be included tensorwatch's setup.py dependencies.

opened by rfernand2 2
Fix gradcam.py

I fixed issue(#73). Add new properties(self.handle_forward_hook and self.handle_backward_hook). and remove register_forward_hook() and register_backward_hook()(explain).

opened by kikusui6192 1
Can't generate images

import tensorwatch as tw import torchvision.models alexnet_model = torchvision.models.alexnet() tw.draw_model(alexnet_model, [1, 3, 224, 224])

error:

ModuleNotFoundError Traceback (most recent call last) ~/anaconda2/envs/pytorch1.0/lib/python3.6/site-packages/IPython/core/formatters.py in call(self, obj) 343 method = get_real_method(obj, self.print_method) 344 if method is not None: --> 345 return method() 346 return None 347 else:

~/anaconda2/envs/pytorch1.0/lib/python3.6/site-packages/tensorwatch/model_graph/hiddenlayer/graph.py in repr_svg(self) 391 def repr_svg(self): 392 """Allows Jupyter notebook to render the graph automatically.""" --> 393 return self.build_dot(self.orientation).repr_svg() 394 395 def save(self, path, format="pdf"):

~/anaconda2/envs/pytorch1.0/lib/python3.6/site-packages/tensorwatch/model_graph/hiddenlayer/graph.py in build_dot(self, orientation) 333 Returns a GraphViz Digraph object. 334 """ --> 335 from graphviz import Digraph 336 337 # Build GraphViz Digraph

ModuleNotFoundError: No module named 'graphviz'

<tensorwatch.model_graph.hiddenlayer.graph.Graph at 0x7fe7ce046898>

Write it as follows, without error, but without image

import tensorwatch as tw import torchvision.models alexnet_model = torchvision.models.alexnet() dd = tw.draw_model(alexnet_model, [1, 3, 224, 224]) print(dd)

<tensorwatch.model_graph.hiddenlayer.graph.Graph object at 0x7fe7cdfc5c88>
bug

opened by chl916185 1

raise RuntimeError("ONNX symbolic expected a constant value in the trace")

Hello, the model I am using is EfficientNet, the pytorch version is 1.0.1, python3.6, CUDA9.0, but I will report an error.

model.py

from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms, models
from efficientnet_pytorch import EfficientNet
from efficientnet_pytorch import utils

from torchsummary import summary
from torchstat import stat
from tensorboardX import SummaryWriter
writer = SummaryWriter('log')

import torch.onnx
import tensorwatch as tw


def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        output1 = torch.nn.functional.log_softmax(output, dim=1)
        loss = F.nll_loss(output1, target)
        #loss = F.l1_loss(output, target)
        loss.backward()
        optimizer.step()

        #new ynh
        #每10个batch画个点用于loss曲线
        if batch_idx % 10 == 0:
            niter = epoch * len(train_loader) + batch_idx
            writer.add_scalar('Train/Loss', loss.data, niter)

        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))


def test(args, model, device, test_loader, epoch):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            output1 = torch.nn.functional.log_softmax(output, dim=1)
            test_loss += F.nll_loss(output1, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    # new ynh
    writer.add_scalar('Test/Accu', test_loss, epoch)


    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


def main():
    # Training settings
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--batch-size', type=int, default=10, metavar='N',
                        help='input batch size for training (default: 64)')
    parser.add_argument('--test-batch-size', type=int, default=10, metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs', type=int, default=10, metavar='N',
                        help='number of epochs to train (default: 10)')
    parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
                        help='learning rate (default: 0.01)')
    parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
                        help='SGD momentum (default: 0.5)')
    parser.add_argument('--no-cuda', action='store_true', default=False,
                        help='disables CUDA training')
    parser.add_argument('--seed', type=int, default=1, metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument('--log-interval', type=int, default=10, metavar='N',
                        help='how many batches to wait before logging training status')

    parser.add_argument('--save-model', action='store_true', default=False,
                        help='For Saving the current Model')
    args = parser.parse_args()
    use_cuda = not args.no_cuda and torch.cuda.is_available()

    torch.manual_seed(args.seed)

    device = torch.device("cuda" if use_cuda else "cpu")

    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST(root='./mnist', train=True,download=True,
                       transform=transforms.Compose([
                           transforms.Resize((224), interpolation=2),
                           transforms.Grayscale(3),
                           transforms.ToTensor(),
                       ])),
        batch_size=args.batch_size, shuffle=True, **kwargs)
    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST(root='./mnist', train=False, transform=transforms.Compose([
            transforms.Resize((224), interpolation=2),
            transforms.Grayscale(3),
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])),
        batch_size=args.test_batch_size, shuffle=True, **kwargs)

    blocks_args, global_params = utils.get_model_params('efficientnet-b0', override_params=None)
    #model = EfficientNet.from_pretrained('efficientnet-b0').to(device)#.cuda()
    model = EfficientNet(blocks_args, global_params)#.to(device)  # .cuda()

    #dummy_input = torch.rand(1, 3, 224, 224)
    #writer.add_graph(model, (dummy_input,))

    #dummy_input = torch.randn(10, 3, 224, 224, device='cuda')
    #model = model.cuda()
    #model1 = models.alexnet(pretrained=True).cuda()
    #torch.onnx.export(model1, dummy_input, "efficientnet.onnx", verbose=True)

    #print(model)
    tw.draw_model(model, [1, 3, 224, 224])

    #stat(model, (3, 224, 224))
    model.to(device)
    #summary(model, (3, 224, 224))

    print("-------------------------------------------")



    optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(args, model, device, test_loader, epoch)

    if (args.save_model):
        torch.save(model.state_dict(), "mnist_cnn.pt")

    writer.close()


if __name__ == '__main__':
    main()

utils.py

"""
This file contains helper functions for building the model and for loading model parameters.
These helper functions are built to mirror those in the official TensorFlow implementation.
"""

import re
import math
import collections
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils import model_zoo


########################################################################
############### HELPERS FUNCTIONS FOR MODEL ARCHITECTURE ###############
########################################################################


# Parameters for the entire model (stem, all blocks, and head)
GlobalParams = collections.namedtuple('GlobalParams', [
    'batch_norm_momentum', 'batch_norm_epsilon', 'dropout_rate',
    'num_classes', 'width_coefficient', 'depth_coefficient',
    'depth_divisor', 'min_depth', 'drop_connect_rate',])


# Parameters for an individual model block
BlockArgs = collections.namedtuple('BlockArgs', [
    'kernel_size', 'num_repeat', 'input_filters', 'output_filters',
    'expand_ratio', 'id_skip', 'stride', 'se_ratio'])


# Change namedtuple defaults
GlobalParams.__new__.__defaults__ = (None,) * len(GlobalParams._fields)
BlockArgs.__new__.__defaults__ = (None,) * len(BlockArgs._fields)


def relu_fn(x):
    """ Swish activation function """
    return x * torch.sigmoid(x)


def round_filters(filters, global_params):
    """ Calculate and round number of filters based on depth multiplier. """
    multiplier = global_params.width_coefficient
    if not multiplier:
        return filters
    divisor = global_params.depth_divisor
    min_depth = global_params.min_depth
    filters *= multiplier
    min_depth = min_depth or divisor
    new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor)
    if new_filters < 0.9 * filters:  # prevent rounding by more than 10%
        new_filters += divisor
    return int(new_filters)


def round_repeats(repeats, global_params):
    """ Round number of filters based on depth multiplier. """
    multiplier = global_params.depth_coefficient
    if not multiplier:
        return repeats
    return int(math.ceil(multiplier * repeats))


def drop_connect(inputs, p, training):
    """ Drop connect. """
    if not training: return inputs
    batch_size = inputs.shape[0]
    keep_prob = 1 - p
    random_tensor = keep_prob
    random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=inputs.dtype)  # uniform [0,1)
    binary_tensor = torch.floor(random_tensor)
    output = inputs / keep_prob * binary_tensor
    return output


class Conv2dSamePadding(nn.Conv2d):
    """ 2D Convolutions like TensorFlow """
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, groups=1, bias=True):
        super().__init__(in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias)
        self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]]*2

    def forward(self, x):
        ih, iw = x.size()[-2:]
        kh, kw = self.weight.size()[-2:]
        sh, sw = self.stride
        oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
        pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
        pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
        if pad_h > 0 or pad_w > 0:
            #print("pad_h",x.shape[2],"pad_w",x.shape[3])
            x = F.pad(x, [pad_w//2, pad_w - pad_w//2, pad_h//2, pad_h - pad_h//2])
            #print("pad_h",x.shape[2],"pad_w",x.shape[3])
            #print("===========================")
        return F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)


########################################################################
############## HELPERS FUNCTIONS FOR LOADING MODEL PARAMS ##############
########################################################################


def efficientnet_params(model_name):
    """ Map EfficientNet model name to parameter coefficients. """
    params_dict = {
        # Coefficients:   width,depth,res,dropout
        'efficientnet-b0': (1.0, 1.0, 224, 0.2),
        'efficientnet-b1': (1.0, 1.1, 240, 0.2),
        'efficientnet-b2': (1.1, 1.2, 260, 0.3),
        'efficientnet-b3': (1.2, 1.4, 300, 0.3),
        'efficientnet-b4': (1.4, 1.8, 380, 0.4),
        'efficientnet-b5': (1.6, 2.2, 456, 0.4),
        'efficientnet-b6': (1.8, 2.6, 528, 0.5),
        'efficientnet-b7': (2.0, 3.1, 600, 0.5),
    }
    return params_dict[model_name]


class BlockDecoder(object):
    """ Block Decoder for readability, straight from the official TensorFlow repository """

    @staticmethod
    def _decode_block_string(block_string):
        """ Gets a block through a string notation of arguments. """
        assert isinstance(block_string, str)

        ops = block_string.split('_')
        options = {}
        for op in ops:
            splits = re.split(r'(\d.*)', op)
            if len(splits) >= 2:
                key, value = splits[:2]
                options[key] = value

        # Check stride
        assert (('s' in options and len(options['s']) == 1) or
                (len(options['s']) == 2 and options['s'][0] == options['s'][1]))

        return BlockArgs(
            kernel_size=int(options['k']),
            num_repeat=int(options['r']),
            input_filters=int(options['i']),
            output_filters=int(options['o']),
            expand_ratio=int(options['e']),
            id_skip=('noskip' not in block_string),
            se_ratio=float(options['se']) if 'se' in options else None,
            stride=[int(options['s'][0])])

    @staticmethod
    def _encode_block_string(block):
        """Encodes a block to a string."""
        args = [
            'r%d' % block.num_repeat,
            'k%d' % block.kernel_size,
            's%d%d' % (block.strides[0], block.strides[1]),
            'e%s' % block.expand_ratio,
            'i%d' % block.input_filters,
            'o%d' % block.output_filters
        ]
        if 0 < block.se_ratio <= 1:
            args.append('se%s' % block.se_ratio)
        if block.id_skip is False:
            args.append('noskip')
        return '_'.join(args)

    @staticmethod
    def decode(string_list):
        """
        Decodes a list of string notations to specify blocks inside the network.

        :param string_list: a list of strings, each string is a notation of block
        :return: a list of BlockArgs namedtuples of block args
        """
        assert isinstance(string_list, list)
        blocks_args = []
        for block_string in string_list:
            blocks_args.append(BlockDecoder._decode_block_string(block_string))
        return blocks_args

    @staticmethod
    def encode(blocks_args):
        """
        Encodes a list of BlockArgs to a list of strings.

        :param blocks_args: a list of BlockArgs namedtuples of block args
        :return: a list of strings, each string is a notation of block
        """
        block_strings = []
        for block in blocks_args:
            block_strings.append(BlockDecoder._encode_block_string(block))
        return block_strings


def efficientnet(width_coefficient=None, depth_coefficient=None,
                 dropout_rate=0.2, drop_connect_rate=0.2):
    """ Creates a efficientnet model. """

    blocks_args = [
        'r1_k3_s11_e1_i32_o16_se0.25', 'r2_k3_s22_e6_i16_o24_se0.25',
        'r2_k5_s22_e6_i24_o40_se0.25', 'r3_k3_s22_e6_i40_o80_se0.25',
        'r3_k5_s11_e6_i80_o112_se0.25', 'r4_k5_s22_e6_i112_o192_se0.25',
        'r1_k3_s11_e6_i192_o320_se0.25',
    ]
    blocks_args = BlockDecoder.decode(blocks_args)

    global_params = GlobalParams(
        batch_norm_momentum=0.99,
        batch_norm_epsilon=1e-3,
        dropout_rate=dropout_rate,
        drop_connect_rate=drop_connect_rate,
        # data_format='channels_last',  # removed, this is always true in PyTorch
        num_classes=10,
        width_coefficient=width_coefficient,
        depth_coefficient=depth_coefficient,
        depth_divisor=8,
        min_depth=None
    )

    return blocks_args, global_params


def get_model_params(model_name, override_params):
    """ Get the block args and global params for a given model """
    if model_name.startswith('efficientnet'):
        w, d, _, p = efficientnet_params(model_name)
        # note: all models have drop connect rate = 0.2
        blocks_args, global_params = efficientnet(width_coefficient=w, depth_coefficient=d, dropout_rate=p)
    else:
        raise NotImplementedError('model name is not pre-defined: %s' % model_name)
    if override_params:
        # ValueError will be raised here if override_params has fields not included in global_params.
        global_params = global_params._replace(**override_params)
    return blocks_args, global_params


url_map = {
    'efficientnet-b0': 'http://storage.googleapis.com/public-models/efficientnet-b0-08094119.pth',
    'efficientnet-b1': 'http://storage.googleapis.com/public-models/efficientnet-b1-dbc7070a.pth',
    'efficientnet-b2': 'http://storage.googleapis.com/public-models/efficientnet-b2-27687264.pth',
    'efficientnet-b3': 'http://storage.googleapis.com/public-models/efficientnet-b3-c8376fa2.pth',
}

def load_pretrained_weights(model, model_name):
    """ Loads pretrained weights, and downloads if loading for the first time. """
    state_dict = model_zoo.load_url(url_map[model_name])

    pretrained_dict = {k: v for k, v in state_dict.items() if k != "_fc.weight" and k != "_fc.bias"}
    model.state_dict().update(pretrained_dict)
    model.load_state_dict(model.state_dict())

    print('Loaded pretrained weights for {}'.format(model_name))

opened by yangninghua 1

AttributeError: 'torch._C.Node' object has no attribute 'ival'
Read This First

Make sure to describe all the steps to reproduce the issue

Include full error message in the description

Add OS version, Python version, Pytorch version if applicable

Remember: if we cannot reproduce your problem, we cannot find solution!

### OS Version=win10 64bit python Version= 3.9.7 Pytorch Version= 1.10.2 tensorwatch=0.9.1

############################################################################################### ‘’‘ tw.draw_model(model, [1, 3, 512, 512],png_filename='unet.png') File "C:\Anaconda3\envs\pytorch1d10d2\lib\site-packages\tensorwatch_init_.py", line 35, in draw_model g = pytorch_draw_model.draw_graph(model, input_shape) File "C:\Anaconda3\envs\pytorch1d10d2\lib\site-packages\tensorwatch\model_graph\hiddenlayer\pytorch_draw_model.py", line 35, in draw_graph
dot = draw_img_classifier(model, args) File "C:\Anaconda3\envs\pytorch1d10d2\lib\site-packages\tensorwatch\model_graph\hiddenlayer\pytorch_draw_model.py", line 63, in draw_img_classifier g = SummaryGraph(non_para_model, dummy_input) File "C:\Anaconda3\envs\pytorch1d10d2\lib\site-packages\tensorwatch\model_graph\hiddenlayer\summary_graph.py", line 221, in init new_op['attrs'] = OrderedDict([(attr_name, node[attr_name]) for attr_name in node.attributeNames()]) File "C:\Anaconda3\envs\pytorch1d10d2\lib\site-packages\tensorwatch\model_graph\hiddenlayer\summary_graph.py", line 221, in
new_op['attrs'] = OrderedDict([(attr_name, node[attr_name]) for attr_name in node.attributeNames()]) File "C:\Anaconda3\envs\pytorch1d10d2\lib\site-packages\torch\onnx\utils.py", line 1232, in _node_getitem return getattr(self, sel)(k) AttributeError: 'torch._C.Node' object has no attribute 'ival'

’‘’ ###############################################################################################

What's better than filing issue? Filing a pull request :).

------------------------------------ (Remove above before filing the issue) ------------------------------------
opened by strongdiamond 0
AttributeError: 'torch._C.Node' object has no attribute 'ival'

from torchvision.models.resnet import resnet50 import tensorwatch as tw model = resnet50() tw.draw_model(model, [1,3,512,512])

when using tensorwatch and jupyter to watch pytorch models as above codes show, report error as below: module 'torch.onnx' has no attribute 'set_training'

then modify 'set_training' in /anaconda3/lib/python3.9/site-packages/tensorwatch/model_graph/hiddenlayer/summary_graph.py to 'select_model_mode_for_export', but report another error as below: 'torch._C.Node' object has no attribute 'ival'

related versions: pytorch 1.10.1 tensorwatch 0.9.1

opened by bigcatMT 0
import tensorwatch error

import tensorwatch as tw

`

Connected to pydev debugger (build 211.7142.13) Traceback (most recent call last): File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/leef_wsl_u18/miniconda3/envs/py36/lib/python3.6/site-packages/tensorwatch/init.py", line 10, in from .text_vis import TextVis File "/home/leef_wsl_u18/miniconda3/envs/py36/lib/python3.6/site-packages/tensorwatch/text_vis.py", line 5, in from .vis_base import VisBase File "/home/leef_wsl_u18/miniconda3/envs/py36/lib/python3.6/site-packages/tensorwatch/vis_base.py", line 14, in class VisBase(Stream, metaclass=ABCMeta): File "/home/leef_wsl_u18/miniconda3/envs/py36/lib/python3.6/site-packages/tensorwatch/vis_base.py", line 16, in VisBase from IPython import get_ipython, display ImportError: cannot import name 'get_ipython' python-BaseException

Process finished with exit code 1

`

opened by leaf918 0
The code for counting the duration is wrong.
I find two errors in the code for counting duration in file analyzer.py:

In PyTorch, the execution of the program is asynchronous. If we use the following code to record the start and end time, the duration will be very short, because the end time is recorded without waiting for the GPU to complete the computation.

https://github.com/microsoft/tensorwatch/blob/142f83a7cb8c54e47e9bab06cb3a1ef8ae225422/tensorwatch/model_graph/torchstat/analyzer.py#L96

https://github.com/microsoft/tensorwatch/blob/142f83a7cb8c54e47e9bab06cb3a1ef8ae225422/tensorwatch/model_graph/torchstat/analyzer.py#L101

If a module in CNN passes forward propagation multiple times, according to the following code, only the duration of the last forward propagation will be recorded, not the duration of each forward propagation.

https://github.com/microsoft/tensorwatch/blob/142f83a7cb8c54e47e9bab06cb3a1ef8ae225422/tensorwatch/model_graph/torchstat/analyzer.py#L102

Here is my solution:

# tensorwatch\tensorwatch\model_graph\torchstat\analyzer.py class ModuleStats: def __init__(self, name) -> None: # self.duration = 0.0 self.duration = [] def _forward_pre_hook(module_stats:ModuleStats, module:nn.Module, input): assert not module_stats.done torch.cuda.synchronize() module_stats.start_time = time.time() def _forward_post_hook(module_stats:ModuleStats, module:nn.Module, input, output): assert not module_stats.done torch.cuda.synchronize() module_stats.end_time = time.time() # Using a list to store the duration of each forward propagation. # module_stats.duration = module_stats.end_time-module_stats.start_time module_stats.duration.append(module_stats.end_time - module_stats.start_time) # other code

# tensorwatch\tensorwatch\model_graph\torchstat\stat_tree.py class StatNode(object): def __init__(self, name=str(), parent=None): # self.duration = 0 self._duration = [] @property def duration(self): # total_duration = self._duration total_duration = sum(self._duration) for child in self.children: total_duration += child.duration return total_duration # or return self._duration

I also provide a simple comparison result. In the Bottleneck of the ResNet backbone, the same relu function will be called three times, so there will be three corresponding durations. But in the TensorWatch statistics, we can only see one record of relu in the Bottleneck.

https://github.com/open-mmlab/mmdetection/blob/f07de13b82b746dde558202f720ec2225f276d73/mmdet/models/backbones/resnet.py#L260-L299

But using my modified code, we can see that the duration of the three calls to the relu function are all recorded.
opened by Mrliduanyang 0
the save operation succeded but the notebook does not appear to be valid

OS : windows 7 python3.6.8 running the demo: %matplotlib notebook import tensorwatch as tw client = tw.WatcherClient() loss_stream = client.create_stream(expr='lambda d:(d.iter, d.loss)') loss_plot = tw.Visualizer(loss_stream, vis_type='line', xtitle='Epoch', ytitle='Train Loss') loss_plot.show()

notebook note: the save operation succeded but the notebook does not appear to be valid

opened by xingha 0