Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Does a block in Hadoop Distributed File System store multiple small files, or a block stores only 1 file?

share|improve this question
add comment

4 Answers

up vote 7 down vote accepted

Multiple files are not stored in a single block. BTW, a single file can be stored in multiple blocks. The mapping between the file and the block-ids is persisted in the NameNode.

According to the Hadoop : The Definitive Guide

Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.

HDFS is designed to handle large files. If there are too many small files then the NameNode might get loaded since it stores the name space for HDFS. Check this article on how to alleviate the problem with too many small files.

share|improve this answer
    
Do you know how to find the mappings of blocks to files? hadoop fsck / -files -blocks -locations -racks gives file to block mapping but doesn't say in which directory on the real filesystem is the block located (i.e. is it in subdirectory9 or subdirectory61). –  Eugen Dec 19 '11 at 15:39
    
dfs.datanode.data.dir property determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. –  Praveen Sripati Dec 19 '11 at 16:33
    
No, I meant when the physical blocks are stored they could be stored in dfs.datanode.data.dir or in subdirectories under that directory (created by the Datanode). Is there a way to find which block is stored where (as a top-level file or inside some subdirectory)? –  Eugen Dec 19 '11 at 22:22
add comment

Well you could do that using HAR (Hadoop Archive) filesystem which tries to pack multiple small files into HDFS block of special part file managed by HAR filesystem.

share|improve this answer
add comment

A block will store a single file. If your file is bigger that BlockSize(64/128/..) then it will be partitioned in multiple blocks with respective BlockSize.

share|improve this answer
add comment

The main point need to understand in hdfs , file is partioned into blocks based on size and not that there will be some blocks in memory, where files are stored(this is misconception)

Basically multiple files are not stored in a single block(unless it is Archive or Har file).

share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.