Closed
Description
What would you like to be added:
Increase the maximum of blockSize to 64MB.
the blockSize is limited to 16MB as below.
2023/02/24 19:20:09.873853 juicefs[92260] <INFO>: Meta address: sqlite3://myjfs.db [interface.go:402]
2023/02/24 19:20:09.874905 juicefs[92260] <WARNING>: block size is too large: 262144, use 16384 instead [format.go:182]
2023/02/24 19:20:09.891811 juicefs[92260] <INFO>: Data use hdfs://example.com:8020/myjfs/ [format.go:429]
2023/02/24 19:20:10.024762 juicefs[92260] <INFO>: Volume is formatted as {
"Name": "myjfs",
"UUID": "632b5b58-e0cd-4331-875a-a6e604044573",
"Storage": "hdfs",
"Bucket": "example.com:8020",
"AccessKey": "user1",
"BlockSize": 16384,
"Compression": "none",
"TrashDays": 1,
"MetaVersion": 1
}
Why is this needed:
a large number of files are burdens to hdfs.
https://medium.com/arabamlabs/small-files-in-hadoop-88708e2f6a46
If the blockSize of JuiceFS is 16MB, at least 62,500 files would be generated in hdfs for storing 1TB data.
that's great pressure on our hdfs.
I think it is not easy to increase more than 64MB, because the chunk size is 64MB.
but, It looks possible to 64MB.
If there's no problem, I can work on this.
thanks
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
SandyXSD commentedon Mar 10, 2023
In JuiceFS, each block corresponds to an object in the object storage, and the blockSize is used as an unit for many strategies like readahead. The time cost, read amplification and failure rate may increase when the blockSize is too big. Thus, it's not recommended to increase the blockSize to 64 MiB.