Find your optimal configurations

# Find your optimal configurations

Get your filecoin mining operation up and running is hard. Expanding growth of your system is even harder. It will take a lot of time to scale growth and make sure your setup running without errors.

# Overview

General guidelines to follow when optimizing your sealing pipeline.

Pledge 2 to 4 sectors and record the exact time of each task (AP, P1, P2, C2) takes to finish
Make sure all your boxes have tasks assigned to them all the time
Automate your sector pledge command with script (opens new window)/cron
Use MaxSealingSectors to cap maximum number of sectors sealing in parallel
Every worker can be assigned with a subset of tasks (AP, P1, P2, C2) to specialize

# Record time for each task

Types of task that a worker can do.

TTAddPiece   TaskType = "seal/v0/addpiece"
TTPreCommit1 TaskType = "seal/v0/precommit/1"
TTPreCommit2 TaskType = "seal/v0/precommit/2"
TTCommit1    TaskType = "seal/v0/commit/1" // NOTE: We use this to transfer the sector into miner-local storage for now; Don't use on workers!
TTCommit2    TaskType = "seal/v0/commit/2"

TTFinalize   TaskType = "seal/v0/finalize"
TTFetch      TaskType = "seal/v0/fetch"
TTUnseal     TaskType = "seal/v0/unseal"

Each task shows up in log first with key word of prepare (start and end) then with key word of work as another log entry (also start and end).

# seal/v0/fetch
2021-08-03T14:00:07.925+0800    INFO    advmgr  sector-storage/sched_worker.go:401  Sector 7 prepare for seal/v0/fetch ...
2021-08-03T14:05:36.772+0800    INFO    advmgr  sector-storage/sched_worker.go:403  Sector 7 prepare for seal/v0/fetch end ...

2021-08-03T14:05:36.772+0800    INFO    advmgr  sector-storage/sched_worker.go:442  Sector 7 work for seal/v0/fetch ...
2021-08-03T14:05:36.774+0800    INFO    advmgr  sector-storage/sched_worker.go:444  Sector 7 work for seal/v0/fetch end ...

# seal/v0/addpiece
2021-08-03T13:38:37.977+0800    INFO    advmgr  sector-storage/sched_worker.go:401  Sector 8 prepare for seal/v0/addpiece ...
2021-08-03T13:38:37.978+0800    INFO    advmgr  sector-storage/sched_worker.go:403  Sector 8 prepare for seal/v0/addpiece end ...

2021-08-03T13:38:37.978+0800    INFO    advmgr  sector-storage/sched_worker.go:442  Sector 8 work for seal/v0/addpiece ...
2021-08-03T13:44:26.295+0800    INFO    advmgr  sector-storage/sched_worker.go:444  Sector 8 work for seal/v0/addpiece end ...

# seal/v0/commit/2
2021-08-03T13:26:02.119+0800    INFO    advmgr  sector-storage/sched_worker.go:401  Sector 7 prepare for seal/v0/commit/2 ...
2021-08-03T13:26:02.119+0800    INFO    advmgr  sector-storage/sched_worker.go:403  Sector 7 prepare for seal/v0/commit/2 end ...

2021-08-03T13:26:02.119+0800    INFO    advmgr  sector-storage/sched_worker.go:442  Sector 7 work for seal/v0/commit/2 ...
2021-08-03T13:49:46.180+0800    INFO    advmgr  sector-storage/sched_worker.go:444  Sector 7 work for seal/v0/commit/2 end ...

# seal/v0/finalize
2021-08-03T13:54:17.414+0800    INFO    advmgr  sector-storage/sched_worker.go:401  Sector 7 prepare for seal/v0/finalize ...
2021-08-03T13:59:30.471+0800    INFO    advmgr  sector-storage/sched_worker.go:403  Sector 7 prepare for seal/v0/finalize end ...

2021-08-03T13:59:30.471+0800    INFO    advmgr  sector-storage/sched_worker.go:442  Sector 7 work for seal/v0/finalize ...
2021-08-03T14:00:07.915+0800    INFO    advmgr  sector-storage/sched_worker.go:444  Sector 7 work for seal/v0/finalize end ...

Some task may take more time in prepare thanwork and some are the other way around. Generally speaking, when task requires network transfer/bandwidth it will consume more time in prepare while if the task require more computation resources it will consume more time in work. Eg, AP, P1, P2, C2.

To record time of core tasks like AP, P1, P2 and C2, we aggregate both the time of fetch before it and the task itself. For example, time of P1 = time of P1 + time of fetch before P1.

# Performance factors

There are many factors cobtributes to the performance of your sealing pipeline.

# Sealing storage

During sealing of a sector, cache files will be generated by the proof algorithm which requires high disk IO speed. Low IO speed may result in idling of your computation resources (CPUs/GPUs).

Choose apropriate hardware using forumla below.

file size * number of parallel threads / operation time = average file IO speed

To get more precise estimations, sum up per task IO throughput.

AP IO throughput = AP read + AP write
P1 IO throughput = P1 read + P1 write
P2 IO throughput = P2 read + P2 write
C2 IO throughput = C2 read + C2 write

SSD and NVMe are commonly used for sealing storages. To ensure effcient usage of these faster storage, it is recommended to use software RAID on these SSDs.

mdadm -C /dev/md1 -l 0 -n 2 /dev/sdb1 /dev/sdc1
mdadm -C /dev/md2 -l 5 -n 6 /dev/sd[b-g]1
# Options
-C, --create
Create a new array.
-l, --level=
Set RAID level. 
-n, --raid-devices=
Specify the number of active devices in the array.
-x, --spare-devices=
Specify the number of spare (eXtra) devices in the initial array.
-A, --assemble
Assemble a pre-existing array.

More on mdadm, please visit here (opens new window). Get latest version from here (opens new window).

# Permanent storage

Possible adversaries to overcome when setting up permanent storage.

When a sector is sealed, it will be transferred from sealer to permanent storage which takes up network bandwidth and disk IOs.
During a windowPost, random selections files will be read in large number. Slow read may result in failed windowPost.
Choose high RAID level to have redunancy when possible. Eg, RAID5, RAID6, RAID10.
Monitor usage of your disk array.

# Network transfer

During sealing, if you specialize your worker in one type of task (to increase efficiency of your resources), it will result in file transfer over the network. If file being copied too slowly over the network, it will drag the speed of your sealing pipeline down. Closely monitor your computation resources and see if there is any idling. For example, if PC2 takes 25 minutes, reads ~400G and writes ~100G, then IO throughput will be ~368 MB/s (440 * 1024 / 25 / 60 + 100 * 1024 / 25 / 60).

After sealing, the sealed sector need to be transferred to permanent storage which can be bottlenecked by the network bandwidth connecting your venus-sealer and your HDD disk array.

# Environment variables

SHA extension would make a huge difference in computing P1 tasks. P1 could cost around 250 minutes with SHA extension enabled while may cost 420+ minutes without SHA.

When compiling venus-sealer, make sure you have set RUSTFLAGS="-C target-cpu=native -g" FFI_BUILD_FROM_SOURCE="1" flags and you shall see the following example output.

+ trap '{ rm -f $__build_output_log_tmp; }' EXIT
+ local '__rust_flags=--print native-static-libs -C target-feature=+sse2'
+ RUSTFLAGS='--print native-static-libs -C target-feature=+sse2'
+ cargo +nightly-2021-04-24 build --release --no-default-features --features multicore-sdr --features pairing,gpu
+ tee /tmp/tmp.IYtnd3xka9
   Compiling autocfg v1.0.1
   Compiling libc v0.2.97
   Compiling cfg-if v1.0.0
   Compiling proc-macro2 v1.0.27
   Compiling unicode-xid v0.2.2
   Compiling syn v1.0.73
   Compiling lazy_static v1.4.0
   Compiling cc v1.0.68
   Compiling typenum v1.13.0
   Compiling serde_derive v1.0.126
   Compiling serde v1.0.126

# Core restriction

When running two types of tasks on same box, you may want to restrict CPU cores each task may use without competing for resources of the other.

Through taskset. Note you cannot dynamically change core restrictions during execution of the program.

TRUST_PARAMS=1 nohup taskset -c 0-32 ./venus-worker run
# Non-consecutive core selection 
taskset -c 0-9,19-29,39-49

Or through Cgrep, which supports dynamic core restrictions during program execution.

sudo mkdir -p /sys/fs/cgroup/cpuset/Pre1-worker
sudo echo 0-31 > /sys/fs/cgroup/cpuset/Pre1-worker/cpuset.cpus
sudo echo <PID> > /sys/fs/cgroup/cpuset/Pre1-worker/cgroup.procs

# Worker optimization

All numbers are for 32G sectors. For 64G sectors, double what the numbers of 32G sector.

# P1 optimization

Set following environment variable to speed up P1.

# Store cache files in RAM; for 32G sectors, it will cost 56G RAM
export FIL_PROOFS_MAXIMIZE_CACHING=1  
# Use multiple cores for P1
export FIL_PROOFS_USE_MULTICORE_SDR=1

P1 RAM usage includes 56G cache file and 2 layers of the sector for each sector sealing in parallel.

# Assume 10 sector running in parallel
56G + 32G * 2 * 10 = 696G

P1 SSD usage includes 11 layers of the sector, 64G of tree-d file and 32G of the unsealed sector.

# For 1 sector
11 * 32G + 64 + 32 = 440G

# P2 optimization

Set following environment variable to speed up P2.

# Use GPU for tree-r-last
export FIL_PROOFS_USE_GPU_COLUMN_BUILDER=1   
# Use GPU for tree-c
export FIL_PROOFS_USE_GPU_TREE_BUILDER=1

P2 RAM usage is 96G.

# Assume 10 sector running in parallel
96G * 10 = 960G

P1 SSD usage includes 4.6G tree-c file * 8, 9.2M tree-r-last file * 8, 4K t_aux file, 4K p_aux file and 32G unsealed sector file.

4.6G * 8 + 8 * 9.2M + 4K * 2 + 32G = ~70G

# Commit

C1 cost little CPU usage, but require sum of P1 and P2 SSD usage.

P1 440G + P2 79G = 519G

C2 environment variable

BELLMAN_NO_GPU=1
# Example, if you are using 3090
GPUBELLMAN_CUSTOM_GPU="GeForce RTX 3090:10496"

C2 RAM usage.

128G + 64G = 192G

# Optimize sealing pipeline

# Calculate your daily growth

Calculate how many tasks your sealing pipeline can process.

# for each type of task
tasks done / time = production rate
daily production rate * (32G OR 64G) = daily growth in power

For example, if we have one box and can finish P1 in 240 minutes, P2 in 30 minutes and Commit in 35 minutes, then you can derive daily growth by the following chart.

Task	Minute	Parallel	Hourly production rate
P1	240	1	0.25 = 1 / (240 /60)
P2	30	1	2 = 1 / (30 /60)
Commit	35	1	1.71 = 1 / (35 /60)

# Finding optimal task configurations

From the table above, we know that daily growth will be bottlenecked by P1. Adjust number of parallel tasks for different types of task to achieve maximum efficiency.

Task	Minutes	Parallel	Hourly productin	Output	Memory consumption
P1	240	7	1.75 = 7 / (240 /60)	1344 G	504 G = 7*64+56
P2	30	1	2 = 1 / (30 /60)	1536 G	96 G = 1*96
Commit	35	1	1.71 = 1 / (35 /60)	1316 G	192 G = 1*128+64

The goal is to have output for each task to be as close as possible so that the sealing pipeline runs in its maximum efficiency. Things to watchout for includes...

hourly production for Commit is lower than P1, which may result in tasks backlogged in Commit phase.
When one type of tasks being overly efficient than others, resources may become idle.
Miro management is needed to have highest possible efficiency.

# Finding optimal pledging

For example, if you find 7 P1 task to the optimal for your system, change the following venus-sealer configurations.

[Sealing]  
  MaxSealingSectors = 7

# Stop-loss

If one of tasks fails too many times, manual intervention is needed to get sealing pipeline back to its normal output.

Remove sectors when you have the following issues.

Expired ticket
Expired Commit
Corrupted proof params

To remove incomplete sectors.

venus-sealer sectors remove --really-do-it <sectorNum>