Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning

Google Research

Last update: Dec 25, 2022

Related tags

Deep Learning circuit_training

Overview

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning.

Circuit Training is an open-source framework for generating chip floor plans with distributed deep reinforcement learning. This framework reproduces the methodology published in the Nature 2021 paper:

A graph placement methodology for fast chip design. Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter & Jeff Dean, 2021. Nature, 594(7862), pp.207-212. [PDF]

Our hope is that Circuit Training will foster further collaborations between academia and industry, and enable advances in deep reinforcement learning for Electronic Design Automation, as well as, general combinatorial and decision making optimization problems. Capable of optimizing chip blocks with hundreds of macros, Circuit Training automatically generates floor plans in hours, whereas baseline methods often require human experts in the loop and can take months.

Circuit training is built on top of TF-Agents and TensorFlow 2.x with support for eager execution, distributed training across multiple GPUs, and distributed data collection scaling to 100s of actors.

Features
Installation
Quick start
Results
Testing
Releases
How to contribute
AI Principles
Contributors
How to cite
Disclaimer

Features

Places netlists with hundreds of macros and millions of stdcells (in clustered format).
Computes both macro location and orientation (flipping).
Optimizes multiple objectives including wirelength, congestion, and density.
Supports alignment of blocks to the grid, to model clock strap or macro blockage.
Supports macro-to-macro, macro-to-boundary spacing constraints.
Allows users to specify their own technology parameters, e.g. and routing resources (in routes per micron) and macro routing allocation.
Coming soon: Tools for generating a clustered netlist given a netlist in common formats (Bookshelf and LEF/DEF).
Coming soon: Generates macro placement tcl command compatible with major EDA tools (Innovus, ICC2).

Installation

Circuit Training requires:

Installing TF-Agents which includes Reverb and TensorFlow.
Downloading the placement cost binary into your system path.
Downloading the circuit-training code.

Using the code at HEAD with the nightly release of TF-Agents is recommended.

# Installs TF-Agents with nightly versions of Reverb and TensorFlow 2.x
$  pip install tf-agents-nightly[reverb]
# Copies the placement cost binary to /usr/local/bin and makes it executable.
$  sudo curl https://storage.googleapis.com/rl-infra-public/circuit-training/placement_cost/plc_wrapper_main \
     -o  /usr/local/bin/plc_wrapper_main
$  sudo chmod 555 /usr/local/bin/plc_wrapper_main
# Clones the circuit-training repo.
$  git clone https://github.com/google-research/circuit-training.git

Quick start

This quick start places the Ariane RISC-V CPU macros by training the deep reinforcement policy from scratch. The num_episodes_per_iteration and global_batch_size used below were picked to work on a single machine training on CPU. The purpose is to illustrate a running system, not optimize the result. The result of a few thousand steps is shown in this tensorboard. The full scale Ariane RISC-V experiment matching the paper is detailed in Circuit training for Ariane RISC-V.

The following jobs will be created by the steps below:

1 Replay Buffer (Reverb) job
1-3 Collect jobs
1 Train job
1 Eval job

Each job is started in a tmux session. To switch between sessions use ctrl + b followed by s and then select the specified session.

: Starts 2 more collect jobs to speed up training. # Change to the tmux session `collect_job_01`. # `ctrl + b` followed by `s` $ python3 -m circuit_training.learning.ppo_collect \ --root_dir=${ROOT_DIR} \ --replay_buffer_server_address=${REVERB_SERVER} \ --variable_container_server_address=${REVERB_SERVER} \ --task_id=1 \ --netlist_file=${NETLIST_FILE} \ --init_placement=${INIT_PLACEMENT} # Change to the tmux session `collect_job_02`. # `ctrl + b` followed by `s` $ python3 -m circuit_training.learning.ppo_collect \ --root_dir=${ROOT_DIR} \ --replay_buffer_server_address=${REVERB_SERVER} \ --variable_container_server_address=${REVERB_SERVER} \ --task_id=2 \ --netlist_file=${NETLIST_FILE} \ --init_placement=${INIT_PLACEMENT} ">

# Sets the environment variables needed by each job. These variables are
# inherited by the tmux sessions created in the next step.
$  export ROOT_DIR=./logs/run_00
$  export REVERB_PORT=8008
$  export REVERB_SERVER="127.0.0.1:${REVERB_PORT}"
$  export NETLIST_FILE=./circuit_training/environment/test_data/ariane/netlist.pb.txt
$  export INIT_PLACEMENT=./circuit_training/environment/test_data/ariane/initial.plc

# Creates all the tmux sessions that will be used.
$  tmux new-session -d -s reverb_server && \
   tmux new-session -d -s collect_job_00 && \
   tmux new-session -d -s collect_job_01 && \
   tmux new-session -d -s collect_job_02 && \
   tmux new-session -d -s train_job && \
   tmux new-session -d -s eval_job && \
   tmux new-session -d -s tb_job

# Starts the Replay Buffer (Reverb) Job
$  tmux attach -t reverb_server
$  python3 -m circuit_training.learning.ppo_reverb_server \
   --root_dir=${ROOT_DIR}  --port=${REVERB_PORT}

# Starts the Training job
# Change to the tmux session `train_job`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.train_ppo \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --num_episodes_per_iteration=16 \
  --global_batch_size=64 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

# Starts the Collect job
# Change to the tmux session `collect_job_00`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.ppo_collect \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --task_id=0 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

# Starts the Eval job
# Change to the tmux session `eval_job`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.eval \
  --root_dir=${ROOT_DIR} \
  --variable_container_server_address=${REVERB_SERVER} \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

# Start Tensorboard.
# Change to the tmux session `tb_job`.
# `ctrl + b` followed by `s`
$  tensorboard dev upload --logdir ./logs

# 
   
    : Starts 2 more collect jobs to speed up training.
   
# Change to the tmux session `collect_job_01`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.ppo_collect \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --task_id=1 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

# Change to the tmux session `collect_job_02`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.ppo_collect \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --task_id=2 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

Results

The results below are reported for training from scratch, since the pre-trained model cannot be shared at this time.

Ariane RISC-V CPU

View the full details of the Ariane experiment on our details page. With this code we are able to get comparable or better results training from scratch as fine-tuning a pre-trained model. At the time the paper was published, training from a pre-trained model resulted in better results than training from scratch for the Ariane RISC-V. Improvements to the code have also resulted in 50% less GPU resources needed and a 2x walltime speedup even in training from scratch. Below are the mean and standard deviation for 3 different seeds run 3 times each. This is slightly different than what was used in the paper (8 runs each with a different seed), but better captures the different sources of variability.

	Proxy Wirelength	Proxy Congestion	Proxy Density
mean	0.1013	0.9174	0.5502
std	0.0036	0.0647	0.0568

The table below summarizes the paper result for fine-tuning from a pre-trained model over 8 runs with each one using a different seed.

	Proxy Wirelength	Proxy Congestion	Proxy Density
mean	0.1198	0.9718	0.5729
std	0.0019	0.0346	0.0086

Testing

# Runs tests with nightly TF-Agents.
$  tox -e py37,py38,py39
# Runs with latest stable TF-Agents.
$  tox -e py37-nightly,py38-nightly,py39-nightly

# Using our Docker for CI.
## Build the docker
$  docker build --tag circuit_training:ci -f tools/docker/ubuntu_ci tools/docker/
## Runs tests with nightly TF-Agents.
$  docker run -it --rm -v $(pwd):/workspace --workdir /workspace circuit_training:ci \
     tox -e py37-nightly,py38-nightly,py39-nightly
## Runs tests with latest stable TF-Agents.
$  docker run -it --rm -v $(pwd):/workspace --workdir /workspace circuit_training:ci \
     tox -e py37,py38,py39

Releases

While we recommend running at HEAD, we have tagged the code base to mark compatibility with stable releases of the underlying libraries.

Release	Branch / Tag	TF-Agents
HEAD	main	tf-agents-nightly
0.0.1	v0.0.1	tf-agents==0.11.0

Follow this pattern to utilize the tagged releases:

$  git clone https://github.com/google-research/circuit-training.git
$  cd circuit-training
# Checks out the tagged version listed in the table in the releases section.
$  git checkout v0.0.1
# Installs the corresponding version of TF-Agents along with Reverb and
# Tensorflow from the table.
$  pip install tf-agents[reverb]==x.x.x
# Copies the placement cost binary to /usr/local/bin and makes it executable.
$  sudo curl https://storage.googleapis.com/rl-infra-public/circuit-training/placement_cost/plc_wrapper_main \
     -o  /usr/local/bin/plc_wrapper_main
$  sudo chmod 555 /usr/local/bin/plc_wrapper_main

How to contribute

We're eager to collaborate with you! See CONTRIBUTING for a guide on how to contribute. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code of conduct.

Principles

This project adheres to Google's AI principles. By participating, using or contributing to this project you are expected to adhere to these principles.

Main Contributors

We would like to recognize the following individuals for their code contributions, discussions, and other work to make the release of the Circuit Training library possible.

Sergio Guadarrama
Summer Yue
Ebrahim Songhori
Joe Jiang
Toby Boyd
Azalia Mirhoseini
Anna Goldie
Mustafa Yazgan
Shen Wang
Terence Tam
Young-Joon Lee
Roger Carpenter
Quoc Le
Ed Chi

How to cite

If you use this code, please cite both:

@article{mirhoseini2021graph,
  title={A graph placement methodology for fast chip design},
  author={Mirhoseini, Azalia and Goldie, Anna and Yazgan, Mustafa and Jiang, Joe
  Wenjie and Songhori, Ebrahim and Wang, Shen and Lee, Young-Joon and Johnson,
  Eric and Pathak, Omkar and Nazi, Azade and Pak, Jiwoo and Tong, Andy and
  Srinivasa, Kavya and Hang, William and Tuncer, Emre and V. Le, Quoc and
  Laudon, James and Ho, Richard and Carpenter, Roger and Dean, Jeff},
  journal={Nature},
  volume={594},
  number={7862},
  pages={207--212},
  year={2021},
  publisher={Nature Publishing Group}
}

@misc{CircuitTraining2021,
  title = {{Circuit Training}: An open-source framework for generating chip
  floor plans with distributed deep reinforcement learning.},
  author = {Guadarrama, Sergio and Yue, Summer and Boyd, Toby and Jiang, Joe
  Wenjie and Songhori, Ebrahim and Tam, Terence and Mirhoseini, Azalia},
  howpublished = {\url{https://github.com/google_research/circuit_training}},
  url = "https://github.com/google_research/circuit_training",
  year = 2021,
  note = "[Online; accessed 21-December-2021]"
}

Disclaimer

This is not an official Google product.

Comments

$AttributeError: 'PPOLossInfo' object has no attribute 'clip_fraction'$

AttributeError: 'PPOLossInfo' object has no attribute 'clip_fraction'

I have an error after building a docker. Tests were ok. Next I run example script:

... 

python3 -m circuit_training.learning.train_ppo \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --num_episodes_per_iteration=16 \
  --global_batch_size=64 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}
  
  ...

Everything is fail at:

File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1312, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 2888, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 3689, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/tf_agent.py", line 336, in train
    loss_info = self._train_fn(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/utils/common.py", line 188, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/workspace/circuit_training/learning/agent.py", line 415, in _train
    data=loss_info.extra.clip_fraction,
AttributeError: 'PPOLossInfo' object has no attribute 'clip_fraction'
  In call to configurable 'Learner' (<class 'tf_agents.train.learner.Learner'>)

Probably we need to specify exact tf-agent version?

TF version in docker:

tensorflow==2.9.1
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-probability==0.17.0
tf-agents==0.13.0

opened by ZFTurbo 8

Could you share the plc_wrapper_main

Thanks a lot for sharing such grate project! I am digging into the sub-modules, could you share some informatiom about the plc? It seems like an EDA tool, could read def/lef, place cells ,handle blockages and so on. Is there any docs for it's api?

opened by Yangxiaojun1230 4
Bug fix for Grouping code.

Hi, I tried to run grouping code on the synthesized netlist generated using the flow given in this GitHub repo.

I noticed grouper.py and hmetis_util.py uses gfile module to which we do not have access. So I replaced gfile with os.path.

The ariane_test.pb.txt netlist, available in the grouping/testdata directory, does not contain macro orientation. If the input netlist contains macro orientation, then the grouping.py errors out. So I have suggested one line change in the grouping.py.

I can run grouping code for synthesized netlist with the above suggested changes.

Sayak

opened by sakundu 3
How to generate .plc file
Hi,

I have implemented the LEF/DEF converter, but I have some question on how to generate the .plc file.

how do we determine the density cost?

how do we set the wirelength and congestion cost?

how do we set up the parameters about routers?

what is the meaning of smooth factor?

Thank you in advance!
opened by Mr-Fang-VLSI 3
netlist.pb and plc files' format
We are trying to write a tcl to do the transform from LEF/DEF to pb.netlist, so there are a few questions want to confirm first.

pb.netlist and its generated plc files are all the input needed?

node rst_ni seems defined as a port and connected to several clusters (Grp_1204/Grp_1203...), but why clk_i not? example as:

name: "clk_i" attr { key: "side" ... node { name: "rst_ni" input: "Grp_1204/Pinput" input: "Grp_1203/Pinput" input: "Grp_791/Pinput"
opened by wwrrzzttcc 3
Potential bug in getting blockages

Hi,

I think there is some minor issue with get_blockages_from_comments. Function header allows the filename to be either a string or a list of string, but the actual function definition treats string as a list nonetheless.

If you run placement_util_test.py, you will fail to extract blockage information and get something like this:

This is potentially an issue for create_placement_cost_using_common_arguments since it passes a string to retrieve blockages. This function is also used in grouper.py.

Could you please confirm this? Thanks in advance!

opened by Dinple 2
Continuing the same thread as in issue #6

Continuing the same thread as in #6

Hello, @tfboyd thank you so much for the information, can you also suggest from where you extracted the dataset/macros? because Ariane RISC-V open-source Github link is outdated (According to this https://github.com/riscv-software-src/riscv-tools/issues/333)

Thanks, Mrinal

opened by Mrinal18 2
SHA256 optimized asic

Do you recommend this framework for designing an ASIC that is optimized for sha256?

Related question: In the paper accompanying this framework, it's stated that "We group millions of standard cells into a few thousand clusters using hMETIS (Karypis & Kumar, 1998)". I was wondering if this is too coarse of an approximation for optimizing SHA256 specifically.

Appreciate your help, thank you

opened by stevedana 2
plc_client.py behavior

Hi,

It appears to me that plc_client is not fully open-sourced. However, I don't fully understand its behaviour. It seems to me that it is trying to establish some socket connection to a tempfile and retrieve function returns from that socket connection.

Is it connected to some Google-end server or some binary file on the host machine?

opened by Dinple 1
What is the 'Routes per micron, Routes used by macros' in initial.plc?

Hello

Thank for sharing your project.

Your documentations are intuitive but there are somethings I can not understand.

In initial.plc file, you define the term Routes per micron and Routes used by macros.

I think these terms are about a kind of route capacity of each macro and standard cell.

I would appreciate it if you provide it in detail.

Also, I want to extract these terms from lef/def file format.

It would be very helpful if you could give me some advice on extracting them.

Thanks

opened by seungju-mmc 1
What is the "weight" attribute in the pb.txt input file?

As stated in the title, I am not sure what is the weight attribute. I believe it is only mentioned in the ariane test case. Is it the edge weight between groups or is it something else?

opened by Dinple 1
How to use this code with less computing resource?

Hi,

I have a question about circuit training for Araine RISC-V, in the guidance you have provided. You used a lot of computing resources including 20 96vCPUs and 8xV100 GPUs, however we do not have that much computing resources, is that means we can not get a reasonable training result(episode return increase with training process) like you have presented in your tensorboard? If we can get a good training result with our server configuration, would you please tell us how to adjust hyperparameters in your source code, thanks!

opened by KeLiu1998 0
CPU RAM Usage

I'm having issues with the amount of CPU RAM being used by train_ppo.py. Specifically, over the course of training the memory usage steadily increases until there is no memory left, causing an error. This seems odd as I would expect the total memory usage to be roughly constant over the course of training as the model and dataset are both a fixed size. Does anyone have an idea as to why this is the case? Has anyone else experience similar issues, and been able to alleviate them?

opened by cr145 1
Program blocked
Hi there! I ran into some problems when I'm running the project. I did as the README.md says, and when it was executing this line, it got blocked and never return. How could this happen? I have no idea. Could you give me some advice? Thanks a lot!

# learner.py loss_info = self._generic_learner.run(self._steps_per_iter, self._train_iterator)
opened by Rejuy 1
Grouping no longer supports large protobuf netlist

Hi, After this commit, I see the grouping code no longer supports two smaller split protobuf netlists as input as mentioned here. So I can not run the latest grouping code for the large protobuf netlist after splitting it into smaller netlist using split_proto_netlist_main.py.

opened by sakundu 0
Modifying Circuit Training Code

Hi,

MacroPlacement Team has an open sourced version of plc_wrapper_main (under our interpretation). We provide guidence on how to plug it into CircuitTraining in our github documentation but this requires us to modify and publish circuit_training code with ~100 lines of modification.

We do have reference/citation and disclaimer in our github page, but we would like to know if this is permissible and is there any further action we need to take?

opened by Dinple 0
How to load and use a pre-trained model (weights)?

Hi!

I've been playing around with circuit_training and now there's one question that's bothering me a lot.

Let's say, I trained model from scratch for Ariane. And now I want to use the obtained pre-trained weights for macro_tiles_10x10. What are my next steps?

I've tried to find any options that allow loading a pre-trained model, but I failed.

opened by RustamC 0

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning

Related tags

Overview

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning.

Table of contents

Features

Installation

Quick start

Results

Ariane RISC-V CPU

Testing

Releases

How to contribute

Principles

Main Contributors

How to cite

Disclaimer

Comments

Owner

Google Research

This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

Exemplo de implementação do padrão circuit breaker em python

Bagua is a flexible and performant distributed training algorithm development framework.

The pure and clear PyTorch Distributed Training Framework.

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.