Does a block in Hadoop Distributed File System store multiple small files, or a block stores only 1 file?
Multiple files are not stored in a single block. BTW, a single file can be stored in multiple blocks. The mapping between the file and the block-ids is persisted in the NameNode. According to the Hadoop : The Definitive Guide
HDFS is designed to handle large files. If there are too many small files then the NameNode might get loaded since it stores the name space for HDFS. Check this article on how to alleviate the problem with too many small files. |
|||||||||||||
|
Well you could do that using HAR (Hadoop Archive) filesystem which tries to pack multiple small files into HDFS block of special part file managed by HAR filesystem. |
|||
|
A block will store a single file. If your file is bigger that BlockSize(64/128/..) then it will be partitioned in multiple blocks with respective BlockSize. |
|||
|
The main point need to understand in hdfs , Basically multiple files are not stored in a single block(unless it is Archive or Har file). |
|||
|