Python scripts to manage Chia plots and drive space, providing full reports. Also monitors the number of chia coins you have.

Chia Plot, Drive Manager & Coin Monitor (V0.5 - April 20th, 2021)

Multi Server Chia Plot and Drive Management Solution

Be sure to ⭐ my repo so you can keep up to date on any updates and progress!

Hopefully if you are reading this you know what Chia is and what a plot is. I am pretty new to Chia myself but I determined that I needed to come up with a way to manage my plots and hard drives that hold thse plots somewhat automatically since my job as a pilot keeps me gone a lot. I am not a programmer, so please, use this code with caution, test it yourself and provide feedback. This is a working version that is currently running and working for my setup which I will explain below. Hopefully it will be of some help to others, it is the least I can do to give back to the Chia community.

Hopefully, this might provide some inspiration for others in regard to their Chia projects. Contributions are always welcome.

Overview | Dependencies | Installation & Configuration | Command Line Options | Command Line Options | Hardware | Changelog | Todo List

Overview & Theory of Operation

This project was designed around my desire to "farm" the Chia crypto currency. In my particular configuration I have a singe plotting server (chisplot01) creating Chia plots. Once the plotting server has completed a plot, this plot then needs to be moved to a storage server (chisnas01) where it resides while being "farmed". The process of creating the plot is pretty straight forward, but the process of managing that plot once completed was a little more interesting. My plotting server has the capacity to plot 40 parallel plots at a time and that is where I started to have issues.

One of those issues that I faced was the relative slow speed at which plots could be moved from the plotting server to the storage server. The long and short of it was simple - I could not continue to plot 40 at a time as I could not move them off the plotting server fast enough not to run out of insanely expensive NVMe drive space. Even with 10Gbe connections directly between the two devices, using mv or cp on an NFS mount was pretty slow, others suggested rsync which was also pretty slow due to encryption. The same held true for scp. I even went so far as scrapping the ssh that came with Ubunti 20.04 and compiling and installing High Performance SSH from the Pittsburgh Supercomputing Center. While it was faster it did not come close to maxing out my dedicated 10Gbe link. What I did know is that the problem was not the network. Each machine was connected to a dedicated 10Gbe port on a Cisco Nexus 9000 on a dedicated data VLAN specifically for moving those plots. Iperf was able to saturate the 10Gbe link with zero problems.

Next I wanted to make sure the problem was not related to the media I was copying the data to across the wire. I installed a pair of Intel P3700 PCIe NVMe drives in a strip RAID configuration and did a bunch of tests to and from that mountpoint locally and across the wire. As suspected, locally the drives performed like magic, across the wire, they performed at exactly the same speed as my spinners. Clearly the drive medium was also not the problem. It had to be the network.

I spent a lot of time playing around with all of the network settings, changing kernel parameters, turning on and off jumbo frames and adjusting irqbalancing. In the end some of the changes gave me marginal increases in performance but nothing really helped a lot. It was then that I stated looking at the mechanism I was using to move the files again. In researching I ran across a couple of people with the same situation, network speed on heavily loaded systems and lightly loaded networks where traditional methods of moving files around did not work for them. So they used Netcat instead.

After reading up on the pros and cons of netcat vs rsync (which most recommended) I decided to give it a test. I tested cp, mv, rsync, scp and HPSSH across NFS, SMB (miserable) and SSH. With the exception of SMB they all pretty much worked the same. I was getting better performance than I had been using the stock Ubuntu ssh after replacing it with the High Perfomance SSH and nuking encryption, but still nothing to write home about. Then I tested Netcat. I was blown away. I went from much less than 1Gbe to peaks of 5Gbe with netcat. 100G files that had been taking 18 minutes to transfer were now transferring in 3 to 4 minutes.

It is also very interesting to note that TrueNAS utilizes netcat for replication when you tell it you want to disable encryption for replication. In researching this further and talking with some on the forums, they noted that they found it always outperformed all other transfer methods. That was enough for me to validate my decision!

As far as plots go, most folks recommend not using some type of RAID array to protect your plots from loss. the thought process is that if you lose a single plotting drive, no big deal, toss it in the trash and put a replacement drive in and fill it back up with plots. Since I really like FreeNAS, I had just planned on dropping in a new FreeNAS server, throwing a bunch of nine drive RAIDZ2 vdevs in place and move forward. But as many pointed out, that was a LOT of wasted space for data pretty easily replaced. And space is the name of the game with Chia. So with that thought in mind, I decided to build out a jbod as suggested by many others. The question was how to manage getting the plots onto the drives and what to do when the drives filled up.

Welcome to my project! I ended up with basically a client/server arrangement. The software on the plotting server would watch for completed plots and then send those plots (using netcat) to the NAS server. The software on the NAS server would automatically monitor all available drives in the system and place the plot where it needed to go, pretty much all on its own. As I said earlier, my job as a pilot keeps me in the air a lot and I really needed a hands off approach to make this work.

On the Plotter side (every 5 minutes):

Monitors my -d directory for completed plots
Determines if the plot is complete based on the size and name of the plot (currently k32 only)
Checks to see if we are already sending a plot to the NAS, if so, stops
When it is clear to send, picks which plot to send and netcats it to the NAS
Utilizing an ssh subprocess, starts a receiving netcat on the NAS
After the transfer is complete, checks the exact file size of the plot on both systems as a basic verification
Once files sizes are verified, deletes the sent plot
Kills any lingering netcat connections on the NAS

On the NAS side (every 1 minute):

Checks for all available plot storage drives on the system in real time (in case new ones were added)
Based on number of plots on drives, selects the next available plot drive to receive plots
Updates the receive script with the currently selected plot drive
Once that drive is full, selects the next available plot drive and updates the receive script
Complete, selectable notifications via EMail, SMS and Pushbullet about the drive change, includes some SMART drive information
Sends a daily email report including # of drives, # of plots currently and # of plots total based on current drive storage
Daily email also includes plot speed, total plots in last 24 hours and TiB plotting in last 24 hours
Since notifications are already built in, will extend it to alert on drive temps and smartctl warnings
Eventually integrate information into Influx with a Grafana dashboard (including power monitoring via UPS)

On the Farmer (Every 1 minute):

Checks the chia logs (~chia/.chia/mainnet/log/debug.log) for activity
Looks specifically for `Adding coin`
Once found updates a local file showing how many Chia you have
Emails, texts, etc when you get a new coin with the current amount you have
Does NOT use the block explorer so verify!

Command Line Options on the NAS

Ability to run quick reports to the screen (basically daily email dumped to the screen
Ability to send a Drive update email
Ability to monitor drive temperatures by device and serial number

Dependencies

This was designed and is running on Ubuntu 20.04 for both the Plotter and the NAS so pretty much any linux flavor should work. Listed below are the things that I am using on both systems. The one thing to watch out for on the NAS is the version of pySMART you are using. I ran into a bug with the version (1.0) on PyPi and ended up using the version from the TrueNAS Folks. DO NOT use the version from PyPi, it won't work until patched.

I am running on Python 3.8.5 and pretty much everything else came installed with the default Ubuntu Server 20.04 installation. Be sure to install smartmontools apt install smartmontools as well as the following:

Sentry-SDK - (Optional, I use it for error tracking)
Pushbullet (0.12.0) - (Optional, used for PB notifications)
Twilio (6.54.0) - (Optional, used for SMS notifications)
JinJa2 (2.11.3) - (Optional, Used for Email notifications)
ConfigParser (5.0.2) - Used for reading and updating config files
py-SMART (0.3) - Used for reading drive information
Natsort (7.1.1) - Used for natural sorting of drive numbers

Installation & Configuration

The installation of the actual scripts are pretty easy. Just clone the repo and drop it on your server. A lot of what happens is very specific to how I have my systems set up so I will explain my layout in detail along with where in the code you may need to look to make changes to fit your needs. The plotter is pretty easy.

Plotter Configuration

Here is the directry structure I use on my plotter for my main plotting and -d drives:

/mnt
├── nvme
│   ├── drive0
│   ├── drive1
│   ├── drive2
│   └── drive3
└── ssdraid
    └── array0

The /mnt/nvme/driveX drives are used as my temp drives for plotting. These are Intel DC P4510 NVMe drives capable of running 12 to 14 plots each (based on k32 plot size). The /mnt/ssdraid/array0 is my -d drive. This is a RAID0 array of HGST 1.6TB Enterprise SSD drives. This is where the completed plots are stored before they are moved by the plot_manager.py script.

All of my plotting is done as the chia user and so all of my plot_manager files are stored in the /home/chia directory. I do have a little bit of testing built into the script and that is what the test_plots directory is used for. I simple dd 10G of zeros into a test plot file and turn on testing in the script to test everything before going live.

/home/chia/plot_manager
├── logs
└── test_plots

Depending on how your system is set up, you need to make changes at the top of the plot_manager.py script:

# Are we testing?
testing = False
if testing:
    plot_dir = '/home/chia/plot_manager/test_plots/'
    plot_size = 10000000
    status_file = '/home/chia/plot_manager/transfer_job_running_testing'
else:
    plot_dir = "/mnt/ssdraid/array0/"
    plot_size = 108644374730  # Based on K32 plot size
    status_file = '/home/chia/plot_manager/transfer_job_running'

remote_checkfile = '/root/plot_manager/remote_transfer_is_active'

Change the plot_dir and plot_size for both testing and not testing to suit your system and needs.

status_file is a local file that is created when we start a plot move and deleted once we have completed that move. We check for that when we first run to make sure we are not attempting to move several plots at the same time.

You will see this in the logs:
2021-03-19 18:50:01,830 - plot_manager:155 - process_control: DEBUG Checkfile Exists, We are currently Running a Transfer, Exiting

As I grow, I plan on adding in a second dedicated 10Gbe link for moving plots and I can expand this out to include the ability to track sessions across each link.

remote_checkfile is used on the NAS system to prevent our NAS drive_manager.py script from altering the destination drive in the middle of a plot move. On the NAS, I run everything as the root user hence the directory path. Alter to meet your needs.
That is pretty much everything on the plotter side. The final part of the puzzle is the send_plot.sh shell script. On line 99 of the plot_manager.py script you will find this line:
subprocess.call(['/home/chia/plot_manager/send_plot.sh', plot_path, plot_to_process]) You need to alter the directory and name of the script to suite you needs. This is the script that is called that actually send the plot to the nas. This is the contents:

#!/bin/bash
#
ssh root@chianas01-internal "nohup /root/plot_manager/receive_plot.sh $2 > foo.out 2> foo.err < /dev/null &"
sudo /usr/bin/pv "$1" | sudo /usr/bin/nc -q 5 chianas01-internal 4040

Here are a few more paths you need to check. I would put line numbers but those could change, just search the code and replace as necessary.

Before starting the script, make sure you have the following paths correctly identified in the script:

Located in the process_plot() function:
['ssh', nas_server, 'grep enclosure /root/plot_manager/plot_manager_config | awk {\'print $3\'}']).decode(('utf-8'))).strip("\n")

This is the location on the NAS side where the script looks for the current drive being utilized. For example if you have drive_manager.py installed in your home directory /home/your_name/plot_manager then this line would look like this:
['ssh', nas_server, 'grep enclosure /home/your_name/plot_manager/plot_manager_config | awk {\'print $3\'}']).decode(('utf-8'))).strip("\n")

Enter the location the script lives here:
sys.path.append('/home/chia/plot_manager')

And also on this line: ['ssh', nas_server, '/root/plot_manager/kill_nc.sh'])

And here is where the script lives on the NAS server:
subprocess.check_output(['ssh', nas_server, 'touch %s' % '/root/plot_manager/new_plot_received'])

Finally, take a look in the system_logging.py script and make sure all of the paths there are correct and do the same in the loggin.yaml file for your log file locations.

Depending on how you have your NAS setup, we may have to change a few more lines of code. I will come back to that after we talk about the NAS.

NAS Configuration

The NAS setup is pretty unique to me but it should be pretty easy to reconfigure the script to meet other needs.
Coming from running three very large Plex systems (PBs worth), I designed my drive layout and naming so that I could at any moment just by looking at a mountpoint go pull a drive if I needed to do so.

My basic layout is a single SuperMicro storage server with motherboard, 256GB ram and 36 drive bays. This I call enclosure0. This is a Supermicro 847, it has 24 drive bays in front and 12 in the rear. This is my directory structure for that system:

/mnt/enclosure0
├── front
│   ├── column0
│   │   ├── drive0
│   │   ├── drive1
│   │   ├── drive2
│   │   ├── drive3
│   │   ├── drive4
│   │   └── drive5
│   ├── column1
│   │   ├── drive6
│   │   ├── drive7
│   │   ├── drive8
│   │   ├── drive9
│   │   ├── drive10
│   │   └── drive11
│   ├── column2
│   │   ├── drive12
│   │   ├── drive13
│   │   ├── drive14
│   │   ├── drive15
│   │   ├── drive16
│   │   └── drive17
│   └── column3
│       ├── drive18
│       ├── drive19
│       ├── drive20
│       ├── drive21
│       ├── drive22
│       └── drive23
└── rear
    ├── column0
    │   ├── drive24
    │   ├── drive25
    │   └── drive26
    ├── column1
    │   ├── drive27
    │   ├── drive28
    │   └── drive29
    ├── column2
    │   ├── drive30
    │   ├── drive31
    │   └── drive32
    └── column3
        ├── drive33
        ├── drive34
        └── drive35

As you can see just by looking at a mount point I can tell exactly where a drive is located in my system. In addition to that, when I add additional external drive arrays, I just do the same thing with enclosure1, enclosure2, etc and my drive_manager.py script will work no problem. I plug in a new drive, format it XFS, mount it, add it to fstab and that's it. The script does everything else.

I am basically using psutil to get drive space information and then use that to determine where to put plots. When I call psutil I just tell it I want it to look at any device that starts with /dev/sd and any mountpoint that includes enclosure and ends with the word drive:

if p.device.startswith('/dev/sd') and p.mountpoint.startswith('/mnt/enclosure') and p.mountpoint.endswith(drive):

In this manner I will never get swap, temp, boot, home, etc. Nothing at all but my plot storage drives. In order for this to work with your setup, you would have to modify all of these lines to match your paticular configuration.

Once you have that figured out, there are just a couple of other little things that need to be set:

At the top of the script you want to set these to meet your needs:

nas_server = 'chianas01'
plot_size_k = 108995911228
plot_size_g = 101.3623551
receive_script = '/root/plot_manager/receive_plot.sh'

The receive_plot.sh script is created dynamically by the script here:

f = open(receive_script, 'w+')
f.write('#! /bin/bash \n')
f.write(f'nc -l -q5 -p 4040 > "{get_plot_drive_to_use()}/$1" < /dev/null')
f.close()

Once the system determines which drive it wants to use to store plots, it stores that information in a configuration file called plot_manager_config and it looks like this:

[plotting_drives]
current_plotting_drive = /mnt/enclosure0/front/column3/drive18

This is important for several reasons. First this is what tells the system our current plotting drive and it is also used by the plotting server to map the correct path for file size verification after a plot is sent. If you change this file or it's location, you need to update lines 167 on the plotter. Notice that I am specifically grepping for the word enclosure, you want to make sure all of this matches up with how you plan on doing things!

remote_mount = str(subprocess.check_output(['ssh', nas_server, 'grep enclosure /root/plot_manager/plot_manager_config | awk {\'print $3\'}']).decode(('utf-8'))).strip("\n")

OK, once you have everything setup, on the plotter you simply run the script and if everything is setup correctly you should see the following:

2021-03-19 19:20:01,543 - plot_manager:92 - process_plot: DEBUG process_plot() Started
2021-03-19 19:20:01,543 - plot_manager:126 - process_control: DEBUG process_control() called with [check_status] and [0]
2021-03-19 19:20:01,543 - plot_manager:158 - process_control: DEBUG Checkfile Does Not Exist
2021-03-19 19:20:01,543 - plot_manager:74 - get_list_of_plots: DEBUG get_list_of_plots() Started
2021-03-19 19:20:01,544 - plot_manager:77 - get_list_of_plots: DEBUG plot-k32-2021-03-18-02-33-xxx.plot
2021-03-19 19:20:01,544 - plot_manager:82 - get_list_of_plots: INFO We will process this plot next: plot-k32-2021-03-18-02-33-xxx.plot
2021-03-19 19:20:01,545 - plot_manager:126 - process_control: DEBUG process_control() called with [set_status] and [start]
2021-03-19 19:20:02,228 - plot_manager:98 - process_plot: INFO Processing Plot: /mnt/ssdraid/array0/plot-k32-2021-03-18-02-33-xxx.plot
2021-03-19 19:25:00,697 - plot_manager:165 - verify_plot_move: DEBUG verify_plot_move() Started
2021-03-19 19:25:01,433 - plot_manager:171 - verify_plot_move: DEBUG Verifing: chianas01-internal: /mnt/enclosure0/front/column3/drive18/plot-k32-2021-03-18-02-33-xxx.plot
2021-03-19 19:25:02,102 - plot_manager:176 - verify_plot_move: DEBUG Remote Plot Size Reported as: 108898704182
2021-03-19 19:25:02,103 - plot_manager:178 - verify_plot_move: DEBUG Local Plot Size Reported as: 108898704182
2021-03-19 19:25:02,103 - plot_manager:101 - process_plot: INFO Plot Sizes Match, we have a good plot move!
2021-03-19 19:25:02,103 - plot_manager:126 - process_control: DEBUG process_control() called with [set_status] and [stop]
2021-03-19 19:25:03,465 - plot_manager:147 - process_control: DEBUG Remote nc kill called!
2021-03-19 19:25:06,741 - plot_manager:109 - process_plot: INFO Removing: /mnt/ssdraid/array0/plot-k32-2021-03-18-02-33-xxx.plot

This tells you that everything has gone smoothly. You should be able to log into your NAS and verify that the plot is where it is supposed to be. Needless to say I would stick with a bunch of test plots until I had it right, I would also comment out this line:

os.remove(plot_path)

Until you are absolutely certain it is running the way that you would like it to run!

Now on the NAS server, I run my script once per minute. This is what you should see if everything is going well:

drive_manager:258 - update_receive_plot: DEBUG update_receive_plot() Started
drive_manager:266 - update_receive_plot: DEBUG Currently Configured Plot Drive: /mnt/enclosure0/front/column3/drive18
drive_manager:267 - update_receive_plot: DEBUG System Selected Plot Drive:      /mnt/enclosure0/front/column3/drive18
drive_manager:268 - update_receive_plot: DEBUG Configured and Selected Drives Match!
drive_manager:269 - update_receive_plot: DEBUG No changes necessary to /root/plot_manager/receive_plot.sh
drive_manager:270 - update_receive_plot: DEBUG Plots left available on configured plotting drive: 58

When the plot drives does change, you get a nicely formatted email:

Server: chianas01
New Plot Drive Selected at 10:17:17
Previous Plotting Drive............................../mnt/enclosure0/front/column2/drive17
# of Plots on Previous Plotting Drive................56
New Plotting Drive (by mountpoint).................../mnt/enclosure0/front/column2/drive17
New Plotting Drive (by device)......................./dev/sdq1
Drive Size...........................................10.9T
# of Plots on we can put on this Drive...............53

Environmental & Health
Drive Serial Number..................................00000000
Current Drive Temperature............................31°C
Last Smart Test Health Assessment....................PASS

Other Information
Total Plots on chianas01.............................1691
Current Total Number of Plot Drives..................24
Number of k32 Plots until full.......................925
Max # of Plots with current # of Drives..............2640
Approx. # of Days to fill all Plot Drives............16

and a text or pushbullet message: Plot Drive Updated: Was: /mnt/enclosure0/front/column3/drive17, Now: /mnt/enclosure0/front/column3/drive18

And if you choose, once a day you can get a daily update email:

NAS Server: chianas01
Daily Update Email - Generated at 10:20:53
Current Plotting Drive (by mountpoint).................../mnt/enclosure0/front/column2/drive15
Current Plotting Drive (by device)......................./dev/sdq1
Drive Size...............................................10.9T

Environmental & Health
Drive Serial Number......................................00000000
Current Drive Temperature................................31°C
Last Smart Test Health Assessment........................PASS

Other Information
Total Plots on chianas01.................................1692
Current Total Number of Plot Drives......................24
Number of k32 Plots until full...........................925
Max # of Plots with current # of Drives..................2640

Plotting Speed
Total Plots Last 24 Hours................................57
Average Plots Per Hour...................................2.4
Average Plotting Speed Last 24 Hours.....................5.757 TiB/Day
Approx. # of Days to fill all Plot Drives................16

Command Line Options

Staring with V0.3 (April 4th, 2021) (and updated again in V0.4) I have started to add in command line options to make getting plot and other information easier and to generate reports on the fly. Currently the command line options that are availare are:

-h or --help
-v or --version
-ct or --check_temps
-pr or --plot_report
-ud or --update_daily
-off or --offline_hdd
-on or --online_hdd

-h --help and -v --version
These options print out the help message or version information and exits.

******** ChiaNAS Drive Manager - 0.4 (2021-04-13) ********
Running drive_manager.py with no arguments causes drive_manager to run in 'normal' mode.
In this mode drive_manager will check the drive utilization and update which drive your
Chia plots will be sent to when they arrive from your plotter. This is generally called
from a cronjob on a regular basis. Please read the full information about how it works
on my github page.

There are several commandline switches you can use to get immediate reports and feedback:

-dr or --drive_report       Runs the Daily ChiaNAS Report (if configured), and emails
                            it to you. This can be called from a crontab job as well.

-ct or --check_temps        This will query all of your hard drives using smartctl and
                            return a list of drive temperatures to you.

-pr or --plot_report        This queries the NAS and returns a report letting you know
                            how many plots are currently on the system and how many more
                            you can add based on the current drive configuration. It also
                            includes plotting speed information for the last 24 hours.

-ud or --update_daily       This updates the total number of plots the system has created
                            over the past 24 hours. Use with CAUTION!. This should be ran
                            from crontab once every 24 hours only! It updates the total
                            from the last time is was run until now, hence why you should
                            only run this once per 24 hours.

-off or --offline_hdd       This takes a drive as it's input (for example  drive6) and
                            "offlines" it so that no more plots will get written to it.
                            You must --on or --online_hdd the drive for it to be used
                            again. Useful if the drive is failing and needs to be replaced.
                            You cannot "offline a drive that is not mounted.

-on or --online_hdd         This takes a drive as it's input (for example  drive6) and
                            "onlines" it so that plots will get written to it. This option
                            will be UNAVAILABLE if there are no drives that have been
                            offlined!

USAGE:

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -dr, --daily_report   Run the ChiaPlot Daily Email Report and exit
  -ct, --check_temps    Return a list of drives and their temperatures and exit
  -pr, --plot_report    Return the total # of plots on the system and total you can add and exit
  -ud, --update_daily   Updates 24 hour plot count. USE WITH CAUTION, USE WITH CRONTAB
  -off OFFLINE_HDD, --offline_hdd OFFLINE_HDD.  Offline a specific drive. Use drive number: drive6

-dr --drive_report
This option outputs the HTML versionof the Daily Drive Report email to the screen and also emails. This only works if configured. If this notification is set to off, this will do nothing.

-ct --check_temps
This options prints the serial number, device name, drive number and temperature of all devices desiginated as plot drives.

#################################################################
################# chianas01 Temperature Report ##################
#################################################################
#    Serial#     #     Device     #     Drive     #    Temp     #
#################################################################
#   00000000     #   /dev/sdb1    #    drive0    #     28°C     #
#   00000000     #   /dev/sdc1    #    drive1    #     29°C     #
#   00000000     #   /dev/sdd1    #    drive2    #     28°C     #
#   00000000     #   /dev/sde1    #    drive3    #     28°C     #
#   00000000     #   /dev/sdf1    #    drive4    #     27°C     #
#   00000000     #   /dev/sdg1    #    drive5    #     24°C     #
#   00000000     #   /dev/sdh1    #    drive6    #     30°C     #
#   00000000     #   /dev/sdi1    #    drive7    #     30°C     #
#   00000000     #   /dev/sdj1    #    drive8    #     29°C     #
#   00000000     #   /dev/sdk1    #    drive9    #     28°C     #
#   00000000     #   /dev/sdl1    #    drive10    #     27°C     #
#   00000000     #   /dev/sdm1    #    drive11    #     24°C     #
#   00000000     #   /dev/sdn1    #    drive12    #     29°C     #
#   00000000     #   /dev/sdo1    #    drive13    #     30°C     #
#   00000000     #   /dev/sdp1    #    drive14    #     30°C     #
#   00000000     #   /dev/sdq1    #    drive15    #     29°C     #
#   00000000     #   /dev/sdr1    #    drive16    #     27°C     #
#   00000000     #   /dev/sds1    #    drive17    #     25°C     #
#   00000000     #   /dev/sdt1    #    drive18    #     28°C     #
#   00000000     #   /dev/sdu1    #    drive19    #     30°C     #
#   00000000     #   /dev/sdv1    #    drive20    #     29°C     #
#   00000000     #   /dev/sdw1    #    drive21    #     29°C     #
#   00000000     #   /dev/sdx1    #    drive22    #     28°C     #
#   00000000     #   /dev/sdy1    #    drive23    #     25°C     #
#   00000000     #   /dev/sdz1    #    drive24    #     32°C     #
#   00000000     #   /dev/sdaa1    #    drive25    #     33°C     #
#   00000000     #   /dev/sdab1    #    drive26    #     32°C     #
#   00000000     #   /dev/sdac1    #    drive27    #     34°C     #
#   00000000     #   /dev/sdad1    #    drive28    #     35°C     #
#   00000000     #   /dev/sdae1    #    drive29    #     33°C     #
#   00000000     #   /dev/sdaf1    #    drive30    #     29°C     #
#   00000000     #   /dev/sdag1    #    drive31    #     30°C     #
#   00000000     #   /dev/sdah1    #    drive32    #     30°C     #
###################################################################

-pr --plot_report
This options prints out a quick version of the daily plot report to the screen and exits.

############################################################
################### chianas01 Plot Report ##################
############################################################
Total Number of Plots on chianas01:                     1690
Total Number of Systemwide Plots Drives:                  24
Total Number of k32 Plots until full:                    926
Maximum # of plots when full:                           2640
Plots completed in the last 24 Hours:                     57
Average Plots per Hours:                                 2.4
Average Plotting Speed Last 24 Hours (TiB/Day):         5.76 
Appx Number of Days to fill all current plot drives:      16  
Current Plot Storage Drive:                        /dev/sdq1
Temperature of Current Plot Drive:                      30°C
Latest Smart Drive Assessment of Plot Drive:            PASS
############################################################

-ud --update_daily
This option is really designed to be called from your crontab riught before you run your daily email:

01 00 * * * /usr/bin/python3 /root/plot_manager/drive_manager.py -ud >/dev/null 2>&1
02 00 * * * /usr/bin/python3 /root/plot_manager/drive_manager.py -dr >/dev/null 2>&1

It is designed to calculate the information necessary for the daily plot report. You can run it anytime you want but it will reset this information to the point in time you called it as oppoed to giving you a 24 hour view of your system.

Coin Monitor

I wanted a simple way to keep track of when I won coins and how many I had so I added coin_monitor.py to the mix. As I said, in my configuration I have three servers, my plotter, my nas and my farmer. Coin Monitor sits on my farmer and watches the log files for new coins. When it sees that there is a new coin, it checks the coin logfile (/root/coin_monitor/logs/new_coins.log) to make sure we have not already counted those coins. If it is a new coin it writes that coin information to the logfile (new_coins.log) and depending on what notifications you have setup you can receive a notification via SMS, Email and PushBullet. If you have per_coin_email set to True/1 in coin_monitoring_config you will also receive a nicely formatted email on top of your other notifications.

This is stand alone and can be used in combination with or without my plot_manager scripts.

Installation is pretty straight forward. Copy it into your preferred directory and then edit coinf_monitor.py and make sure that this line is pointing to your correct log file location:

chia_log = '/home/chia/.chia/mainnet/log/debug.log'

Also make sure the following line is also correct for your system:

new_coin_log = '/root/coin_monitor/logs/new_coins.log'

Lastly, you need to make sure chia is logging in INFO mode and not WARNING otherwise it will not log new coins. Look here to verify:
/home/chia/.chia/mainnet/config/config.yaml

You should see something like this:

logging: &id001
    log_filename: log/debug.log
    log_level: INFO
    log_stdout: false

Just make sure the log_level is set to INFO and you should be good.

If you would like to test the script, simply create a test "log" file and put this line in it:
06:33:02.632 wallet src.wallet.wallet_state_manager: INFO Adding coin: {'amount': '250000000000',

Save the file and edit the coin_monitor.py script to look at that file and run the script. If everything is configured correctly you should get an email (or whatever notification you set up). If you run it again you should see this in the coin_monitor debug log:
Found coins that were already accounted for in the log!: ['13:52:59.147', '250000000000']

Just make sure to point the script back to the correct logfile after testing!

Hardware I used in my setup

I have had a couple of people ask about the hardware that I used in my setup so I thought I would include it here. Almost everything I purchased was used off eBay. There were minor exceptions but 95% was eBay.

Plotting Server

Dell R820
4 x E5-4657L V2 CPUs (48 Cores, 96 Threads)
512GB DDR3 1600Mhz RAM
Dell 4 x NVMe Drive Cage and Controller
4 x Intel P4510 NVMe Drives (temp drives)
LSI 9431-8i 12Gbs SAS Controller
8 x HGST 1.6TB SAS3 12Gbs SSDs (5.8 PBW) (HUSMR1616ASS205)
(2 in RAID1 for OS, 6 x RAID0 as -d drive)
2 x 10Gbe & 2x 1Gbe NICs
iDrac Enterprise
Dual Power Supplies

NAS for Plot Storage

Supermicro 4U 36 Bay SAS3 Chassis w/X10DRi-T4+ Motherboard
2 x E5-2690 v4 CPU
128GB ECC
1 x AOC-S3008L-8LE 12GB/s SAS3 Controller (IT Mode)
2 x LSI9300-16e 12GB/s External (IT Mode)
36 x 12TB Seagate Exos Drives
10Gbe NICs (on motherboard)
Dual Power Supplies

NAS Expansion Chassis

2 x SuperMicro 847 45-Drive Chassis
Dual Power supplies

Chia Famer Node

Ubuntu 20.04 LTS VM running on Proxmox

My extra drive chassis are empty right now, but as you can see from my drive layout and mounting strategy above, it is super easy to add more drives.

Changelog

V0.4 2021-04-13

Added ability to "offline" a particular drive for maintenance. When you offline a drive it prevents drive_manager.py from using that drive to store plots until you "online" it again.
Only drives that are currently mounted and available on the system and listed as plot drives can be offlined.
Only drives that have been offlined can be onlined. If you run ./drive_manager.py -h you will get a list of drives that have been offlined. If no drives have been offlined you will not see the -on or --online_hdd options.
Modified the sorting option for all plots drives using natsort. This sorts now completely on drive number naturally.
-off or --offline_hdd Offlines a particular drive for maintenance, Error to stdout if drive is not mounted or not aplot drive.
-on or --online_hdd Online a drive that has been offlined making it immediately available for plots. Only drives that have been offline will be listed.

V0.3 2021-04-04 Added multiple command line utilities to drive_manager.py including:

-dr or --drive_report Immediately runs the Daily Report and sends email (if configured).
-ct or --check_temps Checks the temperatures of all drives and returns a report to stdout.
-pr or --plot_report Quick plot report like email report but to stdout.
-ud or --update_daily Designed to be called from cron one per 24-hours. Updates the daily plot stats to provide 24 hours plotting stats (total plots, hourly plots, days until full).

Added plot time informtion to the dailey email report including:

Total plots last 24 hours
Average plots per hour (last 24 hours)
Average plotting speed (last 24 hours)

V0.2 2021-03-23

Moved system logging types to plot_manager_config and updated necessary functions.
Added per_plot system notification function (send_new_plot_notification()).
Updated read_config_data() to support ConfigParser boolean returns.
Updated necessary functions for read_config_data() change

In Closing....

I still have a lot that I want to do with these scripts. I need to do a lot more error checking and management, additional notification capabilities and types and eventually add a web driven interface via Flask. I am not a programmer, I do it for fun so I am sure there may be better ways to do some of the stuff I did her, but it works for me, and hopefully may work for someone else as well.

To reproduce:

follow the instructions and install with install.sh
try python3 drive_manager.py

We get this:

Traceback (most recent call last):
  File "drive_manager.py", line 115, in <module>
    chianas = DriveManager.read_configs()
  File "/root/plot_manager/drivemanager_classes.py", line 115, in read_configs
    total_plot_highwater_warning=server['harvester']['total_plot_highwater_warning'],

and if we rectify the problem by adding the missing key, this follows:

Traceback (most recent call last):
  File "drive_manager.py", line 115, in <module>
    chianas = DriveManager.read_configs()
  File "/root/plot_manager/drivemanager_classes.py", line 116, in read_configs
    total_plots_alert_sent=server['harvester']['total_plots_alert_sent'],
KeyError: 'total_plots_alert_sent'

This is because total_plot_highwater_warning and total_plots_alert_sent are directly accessed in drivemanager_classes.py but not present in the default config file.

bug Fixed

Script acknowledge

Would you please expose your chia command script or one single commnad line considering your plotter-harvester async structure? And plotman script based on one local plotter? Thank you~!

opened by iwbrhwfy 6
help needed

Hey there, you did a really good job with the Chia plot manager. I'm looking for some advice to set up my Chia plotting and farming, same kind of system that you have. would you give me a little hand to pick the devices? also I was wondering how many plots you were able to make a day with your system? thanks in advance! best, olivier

opened by olivierLA75 5
Automated new drive formatting, mounting and updating chia/plot_manger configs

Code that will identify when a new drive is added to the system detecting that it is not mounted not has any partitions on it, will partition it, format it xfs, mount it, enter the required entries in /etc/fstab and update plot_manager and chia with the new mount point information.
TODO In Process

opened by rjsears 4
Import Error with psutil._common
First of all, thank you for putting so much time and effort for making this. I was almost going to attempt doing this (probably a much simpler version since I'm pretty newbie with coding).

I believe I've done all the required setting up, I've changed the directories to suit mine etc.

When I was trying to run drive_manager.py, below error came up:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/txny/chia_plot_manager/chianas/drive_manager.py", line 68, in <module> from psutil._common import bytes2human ImportError: cannot import name 'bytes2human' from 'psutil._common' (/usr/lib/python3/dist-packages/psutil/_common.py)

Any ideas? I do have psutil installed.
help wanted
opened by TnzGit 4

Missing keys in default config file

opened by klemens-st 3

k32 plots have different sizes

Are we testing?

testing = False if testing: plot_dir = '/home/chia/plot_manager/test_plots/' plot_size = 10000000 status_file = '/home/chia/plot_manager/transfer_job_running_testing' else: plot_dir = "/mnt/ssdraid/array0/" plot_size = 108644374730 # Based on K32 plot size status_file = '/home/chia/plot_manager/transfer_job_running'

remote_checkfile = '/root/plot_manager/remote_transfer_is_active'

Just saw that in installation instruction, after checking my files i saw that they have different sizes. looks like according to this code file size have to be exatly 108644374730 or it wont be validated as successful plot. but all plots are valid even they have different sizes...

opened by maxbelz 3
First plot of the job end after 20 minutes

Hi! When i put 3 plot in paralell in the same job, the first end after 20 minutes, that generate one new plot (only 3 in phase 1 whit my 1tb nvme) and all continuos fine. Just the first plot have this issue, the others plots goes to the end. Thx!

opened by bouzaguido 1

Fix list index out of rang error

drive_manager.py throws indexerror when no drives are available to look at. Need to do an exception

root@chianas03:~/plot_manager# ./drive_manager.py 
send_new_plot_notification() Started
update_receive_plot() Started
Total Serverwide Plots: 0
Traceback (most recent call last):
  File "./drive_manager.py", line 747, in <module>
    main()
  File "./drive_manager.py", line 740, in main
    update_receive_plot()
  File "./drive_manager.py", line 500, in update_receive_plot
    if current_plotting_drive == get_plot_drive_to_use():
  File "./drive_manager.py", line 391, in get_plot_drive_to_use
    return (natsorted(available_drives)[0][0])
IndexError: list index out of range

bug

opened by rjsears 1

Update chianas drive_manager.py to include check for active network traffic

In certain circumstances ./drive_manager.py can think there is a remote transfer going when there is not. We need to add in a network check like we do on the sending end so we can somewhat gracefully reset if a remote transfer failed and we still think it is going. This came up when I ran out of drive space!!
TODO

opened by rjsears 1
per_coin = True in coin_monitor_config

Hello, i am not a software deloveper, i was just trying to use your script -> coin_monitor.py

i could not run your script without changing per_coin = True to per_coin_email = True in coin_monitor_config

after chaning that it is working.

maybe this is something to fix. i just wanted to let you know. thanks.

opened by erdincsk 1
Create way to offline drive

Need to create a way to offline a drive for replacement where you can mv or copy off the plots to a new drive and not have drive_manager.py keep putting plots on it.
TODO

opened by rjsears 1
Can we make it work on windows too?

I started all my work on windows. I cannot go to linux now, i have to many plots and many more comming. It would be great to have a windows version too. Thank you :)
question

opened by pajtimrexhepi 1
Question about speed that doesn't make sense

This is more than likely a research assistance request. When I have ran out of space on any plotting director (not temp) or when a plot is transferring from temp to plotting my speed of ncat drops significantly, from 110M/S to sometimes < 10M/S. This seems like an issue that shouldn't exist due to the hardware layout. Here is an example of my NAS and Plotter:

NAS: 24 CPU 100 GB Ram LSI 9205-8e connected to 3x NetApp DS4243 72 Drives

Plotter 24 CPU 100GB Ram Plot Dir (/mnt/hdd/hdd0) 2 TB WD Blue SSD Temp Dir (/mnt/nvme_drive0) Raid 0 2x 1TB Samsung 970

I am using plotman with MadMax, any time a plot is in 4:1 (transferring from temp to plot dir) I see the drop in speed. This is even worse if the temp dir fills up and the 1 or 2 plots sit in 4:1 for an extended period of time.

I am trying to identify why this is happening. This didn't seem like this was an issue before I moved from .92 to .95 but I can't be sure.

I am looking for some help trying to figure this out, I have a plotter with a smallish plot dir and if it happens to fill up then transfers off the plotter start taking 10x as long and thus the entire group of plotters end up waiting in line and getting backed up.
help wanted

opened by thomascooper 1
replace_plot() Script on Failed

I'm getting this error: "replace_plot() Script on Failed" What does it mean? Help pls. ps i have mount points like "/mnt/enclosure0/column0/drive0" so it must work with condition p.device.startswith('/dev/sd') and p.mountpoint.startswith('/mnt/enclosure') and p.mountpoint.endswith(drive)
question

opened by jobsidi 3

install.sh fails using zsh on 0.94

Using zsh as my default shell, on 0.9.4 the install.sh fails with a:

-e -n Should we CONTINUE?
install.sh: 44: read: Illegal option -n
install.sh: 45: install.sh: [[: not found
install.sh: 247: install.sh: Bad substitution
root@harv1:~/plot_manager# bash install.sh nas

Simply re-running it using bash install.sh nas fixes it.

TODO

opened by rglastra 1

Multiple Plotters

Hey! Love the script, thanks for all the work you've put into it. I have 3 separate dedicated plotting machines, the plot_manager.py and kill_nc.py scripts (as well as send/rec plot scripts) are designed for a single plotter. Have any suggestions or thoughts around removing this limitation.

Initially I decided to just append the hostname to the remote_checkfile variable, this way the plotters would not kill each others netcats, but multiple netcats over a gigabit network slows down the the transfer from approx 110MB/s on a single file, to a total of 60-70MB/s across all files. I'd love to avoid that loss of speed if possible, I am already at the cusp of needing 10GBE .
question

opened by thomascooper 34
Fix deps in install.sh

There are some more python and (in my case) apt dependencies than aren't included in install.sh currently. For example, glances-api (does it really depend on Glances, or can this be omitted?), nc,

Probably needs an installation on a fresh install to get the exact list. Offhand I believe I added ~5-6 python packages and ~3 packages via apt.
question

opened by rglastra 1

v0.98(Oct 18, 2021)

Source code(tar.gz)
Source code(zip)
v0.97.2(Sep 18, 2021)

Few minor updates
Source code(tar.gz)
Source code(zip)
v0.96(Sep 6, 2021)

Source code(tar.gz)
Source code(zip)
v0.95(Sep 4, 2021)

Bug fixes and enhancements to 0.94
Source code(tar.gz)
Source code(zip)
V0.94(Aug 9, 2021)

Adds new functionality including pooling support and support for individual old plot replacement as new pool pots are created. Also includes farm-wide reporting updates and a farm-wide drive UUID search engine.
Source code(tar.gz)
Source code(zip)
V0.93(Jul 14, 2021)

Source code(tar.gz)
Source code(zip)
v0.92(Jun 2, 2021)

Source code(tar.gz)
Source code(zip)
v0.91(May 30, 2021)

Source code(tar.gz)
Source code(zip)
v0.8(May 27, 2021)

Source code(tar.gz)
Source code(zip)
V0.7(May 18, 2021)

Various minor bug fixes and enhancements plus the addition of the auto_drive.sh script
Source code(tar.gz)
Source code(zip)
V0.6(Apr 23, 2021)

Updates to reporting including verifying plots and drive space usage against Chia logs. Updates to move_local_plots.py to utilize drive I/O to detect failed transfers.
Source code(tar.gz)
Source code(zip)