# Preping large datasets

A typic workflow for preping large datasets are as following:

  • Download datasets
  • Process data into piece files
  • Store them to pieceStore of both venus-cluster and venus-maket for sealing

# Download large datasets

Download large datasets from your storage client to your storage system by means of your choice.

# go-graphsplit

Install go-graphsplit for splitting deal data.

git clone https://github.com/filedrive-team/go-graphsplit.git

cd go-graphsplit

# get submodules
git submodule update --init --recursive

# build filecoin-ffi
make ffi

make

# Getting piece files

Use TMPDIR to specify where the cache files for processing piece files should be stored.

TIP

The process requires large volumes of disk IOs. A Bus error may indicate that you may need faster disks.

$ TMPDIR=/mnt/nvme01 /root/graphsplit chunk \
--car-dir=/mnt/nas/venus-data/16g-pice-data \
--slice-size=1073741824 \
--parallel=1 \
--graph-name=gs-test \
--calc-commp \
--rename \
--parent-path=/mnt/nas/venus-data/tess/ \
/mnt/nas/venus-data/tess/ >> /root/nas-nas-para15-30.log 2>&1 &

TIP

--car-dir: Specify the path where the CAR files should be stored; --slice-size: Specify the output piece file size (byte as unit); Eg, 1024 * 1024 * 1024 = 1073741824 means 1G of piece file; It is recommended to use either 16G(17179869184) or 32G(34359738368); --parallel: Max parallel processes allowed; --calc-commp: Compute value of commp; --rename: Convert CAR files to piece data;

When processing is done, there will be many piece files and a manifest.csv under --car-dir. Transfer piece files to the path defined by pieceStore for both venus-market and venus-sector-manager.

TIP

manifest.csv contains information for proposing storage deals.

TIP

Check deal start epoch and make sure to seal the deal before the deal starts.

# Sealing the deal

# venus-market

Check deal status using venus-market.

TIP

If deal status is Undefined, it means deal is waiting for venus-sector-manager to prepare the deal sector id.

venus-market storage-deals list
/root/.venusmarket
ProposalCid  DealId  State              PieceState  Client                                     Provider  Size    Price  Duration
...hbgguc6a  172163  StorageDealWait  Undefind    t1yusfltophrl3z5zgemgr3pwgg3nzdjbjky          t0xxxx   16GiB   0 FIL  1059840
...t2wycjiq  172164  StorageDealWait  Undefind    t1yusfltophrl3z5zgemgr3pwgg3nzdjbjky          t0xxxx   16GiB   0 FIL  1059840
...5tkvirfe  172165  StorageDealWait  Undefind    t1yusfltophrl3z5zgemgr3pwgg3nzdjbjky          t0xxxx   16GiB   0 FIL  1059840
...btsawgt2  172166  StorageDealWait  Undefind    t1yusfltophrl3z5zgemgr3pwgg3nzdjbjky          t0xxxx   16GiB   0 FIL  1059840
...feczgggg  172167  StorageDealWait  Undefind    t1yusfltophrl3z5zgemgr3pwgg3nzdjbjky          t0xxxx   16GiB   0 FIL  1059840

# venus-sector-manager

Please make sure the configurations of venus-sector-manager are set to take storage deals.

TIP

Check if both Enabled and EnableDeals are set to true in .venus-sector-manager/sector-manager.cfg

[Miners.Sector]
InitNumber = 1000
MaxNumber = 1000000
Enabled = true
EnableDeals = true
LifetimeDays = 210

TIP

Please make sure that RPC configuration of venus-worker is properly set with token informantion so that it can fetch Piece data from path defined in venus-sector-manager.

[sector_manager]
  rpc_client.addr = "/ip4/192.168.100.1/tcp/1789"
  rpc_client.headers = { User-Agent = "jsonrpc-core-client" }
  piece_token = "eyJhbGciOiJIUzxxxxxxxx.eyJuYW1lIjoibGpoOG1xxx.gY3ymGxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  
  
[[sealing_thread]]
  sealing.enable_deals = true
  sealing.max_retries = 5