In Linux filesystems, the maximum number of inodes varies depending on the filesystem type and its configuration. Here's a breakdown for Ext4, XFS, and Btrfs:
Ext4
- Maximum Inodes: Ext4 supports a maximum of 4 billion inodes (2^32) due to its use of a 32-bit inode number.
- Configuration: The number of inodes can be specified at the time of filesystem creation, and the default is usually set to 1 inode per 16 KB of disk space.
XFS
- Maximum Inodes: XFS does not have a fixed maximum number of inodes like Ext4. Instead, it allows for a much larger number of objects since it uses a 64-bit inode number. The
In Linux filesystems, the maximum number of inodes varies depending on the filesystem type and its configuration. Here's a breakdown for Ext4, XFS, and Btrfs:
Ext4
- Maximum Inodes: Ext4 supports a maximum of 4 billion inodes (2^32) due to its use of a 32-bit inode number.
- Configuration: The number of inodes can be specified at the time of filesystem creation, and the default is usually set to 1 inode per 16 KB of disk space.
XFS
- Maximum Inodes: XFS does not have a fixed maximum number of inodes like Ext4. Instead, it allows for a much larger number of objects since it uses a 64-bit inode number. The practical limit on the number of inodes is determined by the available disk space and the filesystem's configuration.
- Configuration: XFS dynamically allocates inodes as needed, so the limit is generally very high, theoretically allowing for billions of inodes as long as there is sufficient storage.
Btrfs
- Maximum Inodes: Btrfs also uses a 64-bit inode number, allowing it to support a very large number of inodes, theoretically in the billions, similar to XFS.
- Configuration: Btrfs dynamically manages inodes and can allocate them based on usage patterns, meaning the practical limit can be very high, akin to XFS.
Summary
- Ext4: Up to 4 billion inodes (32-bit limit).
- XFS: Very high limit, practically determined by disk space (64-bit inode number).
- Btrfs: Very high limit, practically determined by disk space (64-bit inode number).
In summary, both XFS and Btrfs can support a much larger number of inodes compared to the fixed limit of Ext4, making them more scalable for environments with a massive number of files.
ext2/3/4 hold inode numbers in a 32-bit on-disk structure, so yes, the maximum possible number of inodes on an ext2/3/4 filesystem is 2^32, or about 4 billion.
Most modern filesystems like XFS use 64-bit structures for this sort of thing, so XFS can in theory go to 2^64 inodes. But with a minimum inode size of 256 bytes, that would mean 4096 Exabytes of disk space for the inodes alone. In reality, theoretical/design limits such as this are not usually attainable (or advisable) in practice.
Every file system has its own limits. Another question is why to have that many files on a single file system. If you do, most probably the files are small and thus there is much overhead in storage and processing. I'd consider using a database for such a case.
BTRFS has 2^64 or 64bit inodes. Now the ability to have that many files even on a server is really hard to do. If you did, you'd be the first person to ever break past the 18,446,744,073,709,551,616 inode mark. Good Luck!
ZFS and Btrfs are copy on write file systems. Ext4 is an inplace file system.
Both ZFS and Btrfs have some very advanced file system features in the linux kernel. They have features for creating snapshots, volume management and checksums for data. One essential difference between ZFS and Btrfs is the way they treat data - ZFS maintains data in the form of blocks, Btrfs on the other hand, maintains data in the form of “data-extents”. Both file systems maintain a tree structure for the internal file system metadata structures. One important advantage of ZFS is its support for RAID-Z for providing
ZFS and Btrfs are copy on write file systems. Ext4 is an inplace file system.
Both ZFS and Btrfs have some very advanced file system features in the linux kernel. They have features for creating snapshots, volume management and checksums for data. One essential difference between ZFS and Btrfs is the way they treat data - ZFS maintains data in the form of blocks, Btrfs on the other hand, maintains data in the form of “data-extents”. Both file systems maintain a tree structure for the internal file system metadata structures. One important advantage of ZFS is its support for RAID-Z for providing data reliability, which is different from the standard RAID[0,1,10,4,5,6] that other file systems provide. Further, for data recovery, it supports zfs-scrubber which occasionally runs to rebuild lost data. A serious flaw with ZFS is that it does not have a file system integrity checker, that is available in other standard file systems like ext4, btrfs, XFS, F2FS. Btrfs provides online-correction of data and metadata i.e. if some corrupt data is found while accessing a data or metadata block, btrfs-scrub corrects the data on the fly. Btrfs also comes with an integrity checker - btrfsck, which is not present for ZFS.
EXT4 file system is a by-product of 2 major file system revisions in the first linux file system - the ext2 file system. EXT4 file system is built to handle large files, with less frequent metadata updates than its predecessor file system - the ext3 file system. In ext3 file system, each indirect block contains pointers to single data blocks. Hence for a file containing 1000 data blocks, you needed 1000 pointer entries in the indirect block pointer (i am disregarding direct block pointers here). This leads to excessive file system metadata in ext3 file system, which is inefficient. Instead, ext4 maintains a tree of “extents”, where it records only a range of physical blocks eg. for a 1000 block file, ext4 would store, say 3 entries at ranges 50000-50600, 90100-90400, 11100-11200 physical block number locations where the 1000 data blocks for the file is stored. This way, only 3 entries need to be maintained for the file system metadata instead of the 1000 entries which had to be maintained for ext3. Based on real world workloads which exhibit spatio-temporal locality, it is highly likely that consequetive blocks get read, written to or updated simultaneously. Hence, an extent based design works to ext4’s advantage. A serious flaw with ext4 is that it does not provide data checksums (which are present in btrfs and ZFS). This is disastrous for critical data storage. Corruption of data is neither detected or corrected as checksums are not maintained. For metadata corruption detection and correction, ext4 has a file system checker.
XFS file system is excellent for large files and claims to have great parallel I/O capability. One drawback of XFS is internal fragmentation - the block sizes are so large that small files and files that are not block aligned leave a large amount of space at the end of the blocks unused. A fix for non-block aligned files is tail-packing - last blocks of multiple files are stored together in 1 block which reduces internal fragmentation. However, this file system is still not recommended for workloads having small files or a workload with bi-modal sized distribution (small and large files).
You are correct that the size of metadata in filesystems dictates the various limits imposed on the filesystem. Specifically, the size of the inode number, which is used to uniquely reference inodes, places a hard limit on the number of files on the filesystem.
In most Linux filesystems—including ext3 and ext4—the inode is represented by a 32-bit unsigned integer. Thus the maximum inode number is 2^32 - 1. Assuming zero is a valid inode number, that gives a total of 2^32 files, which is just over 4 billion.
Incidentally, much of the metadata in ext4 is 48 or even 64-bits. For example, the maximu
You are correct that the size of metadata in filesystems dictates the various limits imposed on the filesystem. Specifically, the size of the inode number, which is used to uniquely reference inodes, places a hard limit on the number of files on the filesystem.
In most Linux filesystems—including ext3 and ext4—the inode is represented by a 32-bit unsigned integer. Thus the maximum inode number is 2^32 - 1. Assuming zero is a valid inode number, that gives a total of 2^32 files, which is just over 4 billion.
Incidentally, much of the metadata in ext4 is 48 or even 64-bits. For example, the maximum file size is 16 TB and the maximum filesystem size is a whooping 1 EB. The fact that the ext4 developers choose to not increase the inode number beyond 32-bits suggests the 4 billion files limit isn't a bottleneck for most users.
I was kind of forced to move to btrfs, so let me explain how I learned about it and how it helps me dealing with our application’s issues.
We used to use only NewRelic to monitor our available disk space by that time. One day our application stopped allowing new documents from being uploaded and the errors told us there was no space left in device. I checked it with disk free and it was only about 50% full. At least that was what it seemed like. But trying to create a new empty file would give me the same error. After a bit of research I learned that there’s a limit on the amount of inodes avai
I was kind of forced to move to btrfs, so let me explain how I learned about it and how it helps me dealing with our application’s issues.
We used to use only NewRelic to monitor our available disk space by that time. One day our application stopped allowing new documents from being uploaded and the errors told us there was no space left in device. I checked it with disk free and it was only about 50% full. At least that was what it seemed like. But trying to create a new empty file would give me the same error. After a bit of research I learned that there’s a limit on the amount of inodes available for a given ext4/ext2/ext3 partition, and indeed that limit has been reached in our case. My first attempt was to add a new disk and tried to tune the new ext4 partition to allow more inodes, since we had tons of small files in that partition. We also added a new monitoring check of our own to also check for free inodes and we noticed my attempts to increase the number of inodes didn’t work quite well, so I started looking for alternatives to ext4. That’s how I learned about btrfs.
So, that was the first issue we have fixed by switching to btrfs. There’s no inodes limit with btrfs, so this part was solved. Another problem we had by that time was that backing up those documents hourly used to take very long, because we had to copy too many files each hourly backup. So we replaced our offsite backup server to also use btrfs and now we could just take a snapshot from the previous back-up and only sync the changes. Taking a snapshot is instantaneous.
We also take advantage of snapshots to set up other environments for our application such as staging, cert, experimental and so on. We create those environments based on a recent snapshot from the production database. Documents don’t change often, so we end up creating a snapshot of the production’s documents subvolume when we want to restore a recent dump to one of those environments. That makes it not only really fast to replicate the production environment to another one but it also saves us a lot of space that would be used otherwise if we had to copy all those documents.
Even though we don’t currently use this feature, btrfs also supports incremental backup in a very efficient way: Incremental Backup.
I hope to have illustrated you how useful btrfs can be and how that could be one of the few options you have available when you feel in need for a COW filesystem.
The hierarchy is about this:
It simply defines how directories and files are organized. For example you know that “/” is the root directory, the first, most upper-level directory, basically. Then under “/” you have the “home” directory. And in the home directory you can place personal directories for different users using this system.
So if you have a Linux computer, you will know that in “/home/john” you will find John’s personal files and in “/home/jane” you will find Jane’s files. Two users, two home directories. The core thing being that the hierarchy makes it clear that personal directories
The hierarchy is about this:
It simply defines how directories and files are organized. For example you know that “/” is the root directory, the first, most upper-level directory, basically. Then under “/” you have the “home” directory. And in the home directory you can place personal directories for different users using this system.
So if you have a Linux computer, you will know that in “/home/john” you will find John’s personal files and in “/home/jane” you will find Jane’s files. Two users, two home directories. The core thing being that the hierarchy makes it clear that personal directories for every user should exist within the “/home” directory. Without a hierarchy, everyone would place personal directories wherever they “feel” like it. Then you, as the administrator, would have to look in many different (random) places for these, wasting a lot of time.
Something called the FHS (Filesystem Hierarchy Standard) also tells you that in the “/etc” directory you can find configuration files. Under “/usr/bin” you can find binary (executable) files, basically programs.
And so on, and so on. That’s the hierarchy. Just like in Windows you know that you will find programs installed under the “Program Files” directory.
The filesystem on the other hand is much more… mysterious and advanced. As the name suggests, it’s a system designed to store your files (and directories, data in general).
Why is such a system needed? Because stored data is actually a big mess of ones and zeroes (0,1). If you take a look at a hard-disk, from start to finish, it’s just a biiiiig loooong enumaration of zeroes and ones.
So if you scan a disk from start to finish all you will see is:
010010010010001001001101101101001010101010111001010101010101010101…
Billions and billions of bits.
And in this long mess of zeroes of ones we have our files, our directories, our pictures, programs, all our data! So there’s a need for a system (a filesystem) to be able to understand all this mess of continuous zeroes and ones. The system also has to know where exactly to place zeroes and ones when we create a file. And it has to know what zeroes and ones belong to a file when we open it.
So there’s a whole complicated filesystem that keeps track of all of this stuff so we, the users, don’t have to. All we do is double click on “smaug-the-dog.jpg” and a picture shows up on our screen. The filesystem goes through quite a list of complicated steps to give us the data belonging to that file.
And why are there so many filesystems? Because each filesystem is good at certain stuff, but not so good at other stuff. There’s always a balance of advantages plus disadvantages. So engineers came up with many different filesystems that organize data in different ways. A filesystem might be better for gaming, another might be better to store databases on it, another could be better for systems with very little processing power (for example, small Internet of Things devices), and so on.
Hope this demystified these concepts a little bit.
While ZFS is definitely the more robust if compared to BTRFS, they have similar functionality. One issue with BTRFS is that its RAID-like functionality is still in development phase. E.g. mirroring (like RAID 1) and spanning (RAID 0) works well, but parity (like ZFS’s RAID-Z# / RAID 5/6) was quite buggy till around March this year - they’ve since fixed a big problem with recovery on these. Still in testing phase, so I wouldn’t go for it in production.
ZFS is much longer in the tooth. Not a bad thing in itself, in this case it means much more of its bugs have had time to be fixed. Although that
While ZFS is definitely the more robust if compared to BTRFS, they have similar functionality. One issue with BTRFS is that its RAID-like functionality is still in development phase. E.g. mirroring (like RAID 1) and spanning (RAID 0) works well, but parity (like ZFS’s RAID-Z# / RAID 5/6) was quite buggy till around March this year - they’ve since fixed a big problem with recovery on these. Still in testing phase, so I wouldn’t go for it in production.
ZFS is much longer in the tooth. Not a bad thing in itself, in this case it means much more of its bugs have had time to be fixed. Although that coin does have a second face too, it was revolutionary in 2005 - but it seems to be stuck in a rut, they’ve not changed since then, no updates for new hardware, no adjustments to make it work on SSDs, etc. However, its Linux implementation is rather recent (around the same age as BTRFS). At least it didn’t have to start from scratch, though even just a few years ago it was still very buggy - you were better off using it in a true Unix like BSD instead, though now (around 2 years ago) its Linux implementation is in fact ready for production.
However, for a normal desktop most would not want to use ZFS, nor “really” BTRFS. It’s rather overkill and it requires a lot more resources (especially RAM) to work well, else it could cause drastic slowdowns. Especially if you turn on de-duplication. BTRFS’s memory requirements is a lot lower for similar functionality. But both are very technical - not exactly automated nor set-and-forget.
Also just throwing a file system at a server without regard for what the server is doing is not a good idea. E.g. using ZFS (or for that matter BTRFS) on a server running a database is likely not a good choice. One of the problems is their CoW (Copy on Write) functionality. A database is going to write and re-write portions in one or more files constantly - CoW is going to cause performance issues for this, and likely force a lot of fragmentation as well. CoW really only works well if overwriting entire files instead of just portions within them, forget de-duplication for this scenario. ZFS / BTRFS may be a very good idea for a file server, but not the first choice on a DB backend server.
Note also, there’s an “issue” with ZFS for just your average computer. It doesn’t work too well without ECC based RAM, and nearly all desktops & laptops the normal person is going to buy comes with non-ECC RAM, even to the point where it cannot get upgraded to ECC anyway. It’s supposed to not be an issue, but my own experience, as well as some complaints in forums, does indicate that it tends to have more problems with “normal” RAM than other file systems do. For discussions and research on this, read the comments below my answer here: Irné Barnard's answer to Why does Windows still use NTFS? Why not ext4, the file system for Linux, since it actively prevents disk fragmentation?
Both these file systems are really intended more as a replacement for the very old and basically defunct RAID technologies from the 80s/90s. Their feature sets (at least most of it) only really makes sense on more than one disk, some may be useful even on just one disk, but it’s not much better than something like EXT4. Most desktops would be installed on only a single disk. So I’d avoid them if not running as a server. And once you’re on a server, you should really be using ECC RAM in any case - which removes the arguably problematic corruptions of ZFS anyway. Not to mention, you tend to run a server with enough hardware t suit its purpose, and you don’t want to do anything else on it.
What I can state though: If a desktop (instead of server) is going for multiple disks in RAID, it is definitely a better idea to use something like ZFS instead of LVM (or even more motherboard based hardware RAID). Even BTRFS is a whole lot better than the more simplistic RAID 1 / 0 types. But I say this since setting up a RAID (and more importantly maintaining it) requires daily maintenance anyway. It is NOT something you can just leave to work as it is - that’s just begging to lose your data. So if anyone’s thinking of a RAID, they should already be thinking of technical maintenance, at which stage it makes a lot more sense using something with more capabilities than a standard RAID setup provides. Else (if they’re unwilling to spend the effort on maintenance) they’d be better off using something like a Union File System over something like EXT4 instead.
If Ubuntu is really going to drop their current default of EXT4 for ZFS I fear they’re going to lose the desktop market. ZFS is simply not a non-tech-friendly file system. It may be good for servers to default to a more robust and fault tolerant file system, but for a desktop user they’re going to balk at the complexities of ZFS’s pools and volumes, the difficulties of adding / removing / expanding (or even the impossibilities) is going to make lots of people very confused and/or unhappy. BTRFS isn’t a lot easier to use, though it is more flexible when it comes to pools and volumes allowed to grow after creation instead of needing re-creation from scratch. But for the normal user - the big reason Ubuntu is even in the game - these are stuff they’ll never even think about. As long as Ubuntu can use ZFS (or whatever) with some automated setups and auto-adjusting maintenance, it might just work. But as soon as some technical issue pops up like a scrub failure, most would uninstall Ubuntu.
Thus, for data storage in a file server ZFS makes the most sense. Otherwise it’s not ideal. BTRFS is slightly more in line with desktop and/or modern hardware, but still more a server focused file system. Perhaps some in-between idea might make more sense, e.g. Apple’s APFS is very similar to how ReiserFS (BTRFS’s predecessor) worked, only they’ve designed it more from a desktop client perspective. Not too sure of Windows’s ReFS, they’ve even discontinued it on all licenses except server and workstation.
Anyhow, ZFS I think is probably best of the lot for large file servers, at least at this point in time. Just because it’s the most well tested alternative offering most of the required features. If BTRFS’s tests and bugfixes are finally complete I’ll change my stance and say goodby to ZFS, it’s been great, but you’re stuck in the mud and not moving with the times.
One think I’m keeping my eye on is BCacheFS. Effectively it’s something designed to be similar to ZFS and BTRFS but built on a caching idea where SSDs are used to hybridize a bunch of HDDs for performance. It does seem as if it’s robust and fault tolerant, provides all the same features as ZFS / BTRFS and allows for even greater flexibility than BTRFS. At least it seems to be a move in the right direction, though I think it’s a half-step instead of looking to 20 years from now.
Btrfs and ZFS are so-called copy-on-write file systems with built-in volume management. ZFS is the most mature of the two in general, but many people say they’ve used Btrfs in production. ZFS is released under a BSD-style license and the Linux people don’t like that, so no official ZFS in the kernel. It is however possible to run OpenZFS on Linux Bianca the ZFS on Linux project. ZFS is native in Solaris and illumos as well as the next gen file system for FreeBSD.
ext4 is a derivative of ext (Extended File System), which in turn begat ext2, which then became ext4, or Fourth Extended File System.
Btrfs and ZFS are so-called copy-on-write file systems with built-in volume management. ZFS is the most mature of the two in general, but many people say they’ve used Btrfs in production. ZFS is released under a BSD-style license and the Linux people don’t like that, so no official ZFS in the kernel. It is however possible to run OpenZFS on Linux Bianca the ZFS on Linux project. ZFS is native in Solaris and illumos as well as the next gen file system for FreeBSD.
ext4 is a derivative of ext (Extended File System), which in turn begat ext2, which then became ext4, or Fourth Extended File System. It’s a modernization of a UFS-style file system which has always been considered the Linux standard. Red Hat recently endorsed SGI’s XFS (the old IRIX file system) which supposedly has some benefits over ext4. These two are more traditional file systems on the surface at least.
I hope this helps somewhat. Check documentation for additional info. Benefits of CoW is data integrity and overal speed. But the main drawbacks of ZFS and Btrfs are related to how much memory they want. Several gigabytes of memory makes the system faster. The more the better.
inode numbers are like pointers within the file system, and are stored within the file system itself. (In particular, a directory entry is the combination of a name and an inode, and you can see this using ls -i.) File descriptor numbers, on the other hand, are not stored anywhere within the file system, and are dynamically generated by the kernel when you call open() (or certain other system calls). They are pointers into the kernel's file descriptor table for a particular process.
An inode number always refers to something on a device somewhere. A file descriptor may also refer to an anonymou
inode numbers are like pointers within the file system, and are stored within the file system itself. (In particular, a directory entry is the combination of a name and an inode, and you can see this using ls -i.) File descriptor numbers, on the other hand, are not stored anywhere within the file system, and are dynamically generated by the kernel when you call open() (or certain other system calls). They are pointers into the kernel's file descriptor table for a particular process.
An inode number always refers to something on a device somewhere. A file descriptor may also refer to an anonymous pipe, a socket, or some other kind of resource, such as the result of a call to epoll_create() on Linux.
An inode number unambiguously identifies a file or directory on a given device, but two files on different mounts may have the same inode. A file descriptor does not unambiguously identify anything by itself; in combination with a process ID it unambiguously identifies some resource on the system, even if you don't know which device it's on.
Every file or directory on a given device has a unique inode number. (If two files on the same device have the same inode number, then they are really the same file with two different names.) On the other hand, a file or directory may be opened several times by the same process or by different processes, and thus have multiple different file descriptors. Additionally, files or directories that are not currently open by any process do not have any file descriptors referring to them.
A valid file descriptor is associated with file mode flags and offset. That is, it grants the possessing process the right to read or write the file (or both), based on how the file descriptor was obtained; it also remembers some position within the file. However, it does not contain any metadata associated with the file itself, such as timestamps or Unix permission bits. An inode contains timestamps and Unix permission bits, but no file mode flags or offset.
About the topic of stability, BTRFS developers bury their head in the sand, officially (see the FAQ) giving a non-answer (“stability? what is it? does it really exist?”).
Assuming you mean stabler than EXT4, the answer is certainly not. There is no comparison when it comes to user base and code maturity. That of course, doesn’t imply that a home user can’t use it in relatively safety (I experienced three bugs, two of which caused my system to be unbootable).
EXT FSs are very different from BTRFS/ZFS, as the latters have many features that the formers haven’t (a quick google search will tell you
About the topic of stability, BTRFS developers bury their head in the sand, officially (see the FAQ) giving a non-answer (“stability? what is it? does it really exist?”).
Assuming you mean stabler than EXT4, the answer is certainly not. There is no comparison when it comes to user base and code maturity. That of course, doesn’t imply that a home user can’t use it in relatively safety (I experienced three bugs, two of which caused my system to be unbootable).
EXT FSs are very different from BTRFS/ZFS, as the latters have many features that the formers haven’t (a quick google search will tell you a lot).
If you want a “new-generation” system, go for ZFS. ZFS is stable.
Erez explained what btrfs is but since by coincidence I’ve been using it for a few weeks now I’ll take this opportunity to explain why.
I currently work on Android. Each Android development source tree takes a lot of disk space, something like 110 gigabytes when “clean” and after a build about 150 gigabytes. For efficiency I have these on an SSD (two, actually) so space is at a premium (I have a terabyte of space in total). As it happens I need several work areas (different Android versions and products) so something like ext4 would fill up quickly. It also takes about two hours to do a full bu
Erez explained what btrfs is but since by coincidence I’ve been using it for a few weeks now I’ll take this opportunity to explain why.
I currently work on Android. Each Android development source tree takes a lot of disk space, something like 110 gigabytes when “clean” and after a build about 150 gigabytes. For efficiency I have these on an SSD (two, actually) so space is at a premium (I have a terabyte of space in total). As it happens I need several work areas (different Android versions and products) so something like ext4 would fill up quickly. It also takes about two hours to do a full build and sometimes this is required.
Btrfs solves these problem: I have a source-only tree that is automatically updated (repo sync) every night. The same script creates a snapshot of this source and does a full build. This way I have a fresh, full build available every morning. If I need the very latest code or if my current build directory gets messed up and needs a clean/build all combo to work again, I can just instead make a snapshot of the build directory which takes a second (two if I need to type my sudo password)! Then I just copy whatever I did on the old directory to the new one either with Beyond Compare or using Git tools (merge or cherry-pick) and I’m good to go. Creating snapshots is extremely quick, just copying the whole source tree would probably take ten minutes nevermind using repotool to create it.
So not only do I get to use disk space more efficiently, I’m actually working much more efficiently! Bye-bye the two hours compilation breaks because the damn thing won’t compile anymore without running a full build. And I also have the very latest version every day ready to be taken into use in a clean environment rather than the environment I’ve been smashing for the past several weeks.
If you are interested in the nit and grit, I explained how to do this here: Using btrfs to speed up Android development. My setup is such that I save the old daily build directories and manually delete them just in case something urgent pops up from yesterday.
Over ZFS the advantage of btrfs is that btrfs is native Linux filesystem that is merged to the mainline Linux kernel so it is usually compiled into the distribution's kernel and available out of the box.
In contrast with that, ZFS isn't in the Linux kernel tree so you either need to use ZFS in fuse which provides lower performance or to use out-of-tree module for your kernel version to enable ZFS support.
Btrfs also uses Linux's page cache while ZFS kernel module implements it's own ARC cache which again can affect performance.
Over ext3/ext4 advantages of btrfs are:
- The Copy-On-Write design which
Over ZFS the advantage of btrfs is that btrfs is native Linux filesystem that is merged to the mainline Linux kernel so it is usually compiled into the distribution's kernel and available out of the box.
In contrast with that, ZFS isn't in the Linux kernel tree so you either need to use ZFS in fuse which provides lower performance or to use out-of-tree module for your kernel version to enable ZFS support.
Btrfs also uses Linux's page cache while ZFS kernel module implements it's own ARC cache which again can affect performance.
Over ext3/ext4 advantages of btrfs are:
- The Copy-On-Write design which allows having 2 different files sharing the same physical storage but acting as separate files. Internally only changes to one of the files are saved instead of having 2 distinct copies of files.
- Snapshot support — copies of subvolumes (specially marked folders) that initially don't consume any disk space, and only future modifications are additionally saved. Snapshot creation is also very fast, usually less than 1 second.
- Disk quotas support — limiting disk space usage per subvolume
- Number of files not practically limited, unlike ext3/4 filesystems which have fixed number of inodes
- Transparent compression — files are compressed on disk, but system views them as uncompressed when reading data from them. This saves disk space.
- Volume management built into the filesystem — one filesystem can utilize multiple physical drives in an arrangement called RAID array. There are several types of RAID. For details on status of btrfs-raid look at the documentation.
WARNING: As of writing this answer RAID 5/6 aren't stable on btrfs. RAID 0 and 1 are supposed to work with no problems. Look up the details on current state in documentation.
FAT32 is quite old filesystem. All btrfs advantages over ext3/4 apply over FAT32 plus additional:
- No practical limit on file size. FAT32 supports maximum file size of 4GB
- Power loss can't corrupt the filesystem (not the case on FAT32)
- No practical limit on volume size. FAT32 supports only volumes up to 2TB.
- Btrfs is better at handling large number of files and keeping performance
Btrfs is the only COW (copy on write) filesystem that you can easily install via the graphical install program of fedora (and probably other distributions). I'll explain the significance of COW below, but for now, the important thing is that if you want COW and you don't want to fiddle around with extra repositories, kernel driver installs and other somewhat complicated installation steps, then btrfs is really your only choice.
Now, why would you want COW?
Copy on write means that instead of modifying blocks in place every time you update your data, your filesystem will transparently create a wh
Btrfs is the only COW (copy on write) filesystem that you can easily install via the graphical install program of fedora (and probably other distributions). I'll explain the significance of COW below, but for now, the important thing is that if you want COW and you don't want to fiddle around with extra repositories, kernel driver installs and other somewhat complicated installation steps, then btrfs is really your only choice.
Now, why would you want COW?
Copy on write means that instead of modifying blocks in place every time you update your data, your filesystem will transparently create a whole new block and repoint the metadata pointers to the new block, leaving the old block in place. That might seem inefficient, but actually has some pretty big benefits:
- Since you don’t “repoint” until the new block is completely written out, if you lose power or crash in the middle of a write, you end up with either the old block or the new block, but not a half-written corrupted block. So you don’t need to fsck filesystems on startup and you somewhat lower your risk of data corruption.
- You can “snapshot” the filesystem at any point, creating a snapshot entry in the metadata with the current set of pointers. This protects old blocks from being garbage collected later on and allows the filesystem to present a volume as it was during the snapshot. (in other words, you have instant rollback capabilities). You can even clone that volume to make it a writable volume based on the snapshot.
Btrfs also has block checksums and block mirroring capabilities (or even RAID5-like parity redundancy), which means that it can protect you from device failures (like RAID) and also do bitrot protection and self-healing (unlike RAID). If a block is read and its checksum doesn’t match, Btrfs will attempt to read the block from another device and if that works, it will write the block back to the “bad” device, correcting the error in place automatically.
I’ve used Btrfs on Fedora for a couple of years, and it seems very stable so far. However, on our servers, we use ZFS on Linux because a few years ago, when Btrfs was much less mature, we had some serious stability issues with it, switched to ZFS on Linux, and never tried Btrfs on the server-side again.
TL;DR
Btrfs is an “easy install”, copy-on-write, snapshot-enabled, self-healing filesystem for Linux that seems pretty stable. Your other choice is ZFS on Linux, which may be more stable, but requires a few more steps to install on typical Linux distributions.
The number of inodes varies from filesystem to filesystem and is set up at the time of filesystem creation. You can view the number of inodes (free/used) through the GNU df command
Example output:
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/hda3 18364896 240571 18124325 2% /
/dev/hda1 130560 36 130524 1% /boot
tmpfs 240945 1 240944 1% /dev/shm
For USB flash drives, Ext3 is still the best choice. As much as I hate VFAT, I love my friends more.
*** recommendation below is very specific to flash drives ***
Ext2 - Yes - too old and no journaling support. It will work fine, but there is no compelling reason to stick to it.
Ext3 - No - very stable, journaling support, capacity up to 16TB, supported by all GNU/Linux distros. Ext3's journaling is not flash friendly. It helps with fast recovery though.
Ext4 - No - stable in newer kernel versions, faster file system check, improved large file performance (extent support), journaled checksums.
For USB flash drives, Ext3 is still the best choice. As much as I hate VFAT, I love my friends more.
*** recommendation below is very specific to flash drives ***
Ext2 - Yes - too old and no journaling support. It will work fine, but there is no compelling reason to stick to it.
Ext3 - No - very stable, journaling support, capacity up to 16TB, supported by all GNU/Linux distros. Ext3's journaling is not flash friendly. It helps with fast recovery though.
Ext4 - No - stable in newer kernel versions, faster file system check, improved large file performance (extent support), journaled checksums. RHEL announced stable Ext4 support only in 5.6 release. e2fsprogs is still limited to 16TB capacity. I use Ext4 with Debian distro for my laptop backup drives. It is my personal #1 choice.
XFS - No - over kill, not much portability, journaling support, up to 100TB, designed for storage servers. Red Hat revived XFS in RHEL 5.2. Most of ex-XFS developers from SGI are now with RH. It is officially recommended filesystem for servers > 16TB capacity. Fast mkfs and fsck. Excellent large file support. Linux 2.6.37/39 got some really nice xfs meta-data performance improvements.
ReiserFS - No - dead project.
VFAT - Yes - most portable across operating systems, no other good reason.
Btrfs - No - Linux's answer to ZFS. Not ready yet, but most promising.
I've never used or contributed to btrfs, though looking at the literature, ZFS was very WAFL inspired [2, 3, 4], and btrfs feels very ZFS inspired[1], so a lot of the benefits of using both filesystems from a user and administrator perspective are very similar.
At the highest level, I'd split the benefits of comparing brtfs versus any other filesystem into 3 categories.
1. Benefits arising out of using a copy on write tree with no in place updates as storage primitive.
2. Benefits arising out out a separation of logical and physical units of data management.
3. Everything else
The first two buckets
I've never used or contributed to btrfs, though looking at the literature, ZFS was very WAFL inspired [2, 3, 4], and btrfs feels very ZFS inspired[1], so a lot of the benefits of using both filesystems from a user and administrator perspective are very similar.
At the highest level, I'd split the benefits of comparing brtfs versus any other filesystem into 3 categories.
1. Benefits arising out of using a copy on write tree with no in place updates as storage primitive.
2. Benefits arising out out a separation of logical and physical units of data management.
3. Everything else
The first two buckets are the ones to pay attention to. If you take a look at literature describing brtfs, WAFL and ZFS, the first two are the real game changers for performance, reliability and data management - most of the other features can be found or ported to other filesystems.
Tree structure based benefits
1. Creating snapshots of your data is extremely efficient
This is the single most important benefit - snapshots are incredibly incredibly efficient to create, and have very little performance impact. This is because creating a snapshot is as simple as creating a new root pointer to your data set and incrementing reference counts on metadata. You can create snapshots in the midst of other activity on your system without adding any significant load to it.
2. Calculating incremental deltas for backups is extremely efficient.
Calculating deltas between two different versions of trees, be it snapshots or writeable clones is highly efficient, since it relies on a pure metadata comparison of reference counts in different snapshots.
3. Creating writeable clones is extremely efficient
Creating a clone is very similar to creating a snapshot and is thus similarly efficient as it simply involves creating a new root pointer to a tree and adding book keeping for reference counting.
This is a killer feature in environments where you have many copies of almost the same data, such as large test databases or virtual machine images.
4. Rolling back to a given snapshot is super efficient
Rolling back to a given older version of the filesystem is highly efficient as it primarily involves swapping a pointer to an older version of a tree.
5. The filesystem has transactional semantics
Non transactional semantics for a filesystem are bad. They can expose you to minor corruptions for operations that update different blocks in different places non atomically.
So for example, for creating a new directory entry, you need to allocate and inode and also make a directory entry point to it. If this is not done atomically, depending on your implementation, you could have a leaked inode or a directory entry that points to an unallocated inode.
With btrfs and similar filesystems, you never write in place and an update to a new version of the filesystem is only complete when you update the root pointer of thew new tree, this means that all your filesystem operations consistently move you from one consistent state to another.
If you crash in the middle of an update, since you never wrote in place and the final tree root pointer update never hit the disks, there is nothing required to get your filesystem back to a consistent state.
This class of benefits is the most important - NetApp shipped a product based on this idea in 1994, and it was a huge competitive differentiator with respect to its success. It was also one of the primary similarities cited when NetApp sued Sun for ZFS [7].
Benefits arising from have a separation of volumes and sub volumes
One key fundamentals of building a high performing file systems is trying to make full use of the IO bandwidth of a large collection of disks. For performance reasons, you want to have as many spindles as possible to parallelize IO.
The initial solution was to simply create one giant filesystem that had a one to one map to a large pool of disks to increase disk bandwidth and also amortize the cost of your RAID parity disk(s).
However, doing so creates a data management nightmare - you can't for example snapshot or backup just your important data at a high frequency without also doing the same for your low priority data.
The solution that addressed both performance needs and data management needs was to make the tree structures live inside what was "traditional" filesystem that directly mapped to a a physical RAID volumes ( known as aggregates, pools and volumes in WAFL, ZFS and btrfs respectively ), while user data lived inside logical volumes ( known as flexible volumes, volumes and subvolumes in btrfs repectively) .
The benefits of this are as below:
1. High performance IO for even small filesystems by sharing spindles with larger filesystems
The idea here is your volume can be a direct mapping to your RAID layout, but your subvolume can be much smaller and still benefit from the raw IO bandwidth.
2. Instant, easy on the fly partition resizing
Partitions in the scheme of subvolumes are just quotas.
Specifically, different subvolumes actually do share free space, so if you find your volumes filling up in a way different form what you planned, you can just adjust the partitions on the fly.
Thus, it makes it easier to aggregate your free space and improve storage utilization and adjust your space allocation on the fly if your partitions are filling up in a manner different to how you had planned.
3. Decoupling the unit of data management from the unit of physical IO bandwidth / redundancy
By having subvolumes being distinct form physical volumes, you can do things such as set up different snapshot and backup schedules for part of your data, without all of it.
Miscellany:
brtfs also has the use of extents to prevent fragmentation (unlike ZFS and WAFL btrfs seems to reference count extents rather than blocks [6]), block checksums, built in compression and various other features, but these are typically the class of features that can be adapted to other filesystems without changing the fundamental architecture of it.
[1]: http://lwn.net/Articles/342892/
[2]: http://www.usenix.org/event/usenix08/tech/full_papers/edwards/edwards.pdf
[3]: http://www.cs.fsu.edu/~awang/courses/cop5611_s2007/wafl.pdf
[4]: http://solaris-training.com/classp/200_HTML/docs/zfs_wp.pdf
[5]: http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf
[6]: http://video.linux.com/video/1787
[7]: http://blogs.netapp.com/dave/2007/09/netapp-sues-sun.html
Btrfs is a improved version. And good thing about it is, it’s rebuild from Scratch. Although it offer greater volumes (up to 16 EB) and improved fault tolerance with Deduplication and mirroring. But it didn’t offer the faster speed.
For enterprises who use very large programs with very large databases, having a seemingly continuous file system across multiple hard drives would be very beneficial because it can make consolidation of data much easier. Data deduplication would reduce the amount of actual space data would occupy, and data mirroring would become easier with btrfs as well when there
Btrfs is a improved version. And good thing about it is, it’s rebuild from Scratch. Although it offer greater volumes (up to 16 EB) and improved fault tolerance with Deduplication and mirroring. But it didn’t offer the faster speed.
For enterprises who use very large programs with very large databases, having a seemingly continuous file system across multiple hard drives would be very beneficial because it can make consolidation of data much easier. Data deduplication would reduce the amount of actual space data would occupy, and data mirroring would become easier with btrfs as well when there is a single, broad file system that needs to be mirrored.
So the answer to your question is it’s not faster and not yet officially declared as stable.
Yes.
I expect most of the answers to this question will like other great debates (vi vs. emacs).
From my camp, ZFS is battle tested file system that be around for more than 10 years. Compared to most of the other UNIX file systems, it solves the problem as a whole rather than a a large series of pieces (it unifies the volume manager and the file system). Few file systems implement the level of checksum usage that that ZFS does (it checks on all reads as well as writes, can transparent fix latent bit loss errors on the fly). It has background process called “scrub” that reads the data on the driv
Yes.
I expect most of the answers to this question will like other great debates (vi vs. emacs).
From my camp, ZFS is battle tested file system that be around for more than 10 years. Compared to most of the other UNIX file systems, it solves the problem as a whole rather than a a large series of pieces (it unifies the volume manager and the file system). Few file systems implement the level of checksum usage that that ZFS does (it checks on all reads as well as writes, can transparent fix latent bit loss errors on the fly). It has background process called “scrub” that reads the data on the drive and fixes any blocks that have “gone bad” [failed the checksum] before the fail happens during a read request made by the applications.
it has snapshots, replication and snapshot cloning. Combined with a boot environment manager, this can allow for risk free updates to production systems (you can have hourly COW snapshots of your running, live production file systems without blowing your disk budget).
Because ZFS unifies the volume manager with the file systems, you can have all the “partitions” you want without having to worry about over and under allocating space. Disk space in ZFS is allocated like virtual memory in RAM. This make life so much nicer. It also has a inline compression algorithm name lz4 that can provide good compression with minimal CPU overhead.
Btrfs is still a bit new and not as well tested and battle hardened. I have the impression that Btrfs was not written with the level of paranoia about data safety/loss prevention that ZFS’s authors had. We you read about Btrfs, it is not uncomon for someone to say “it has ZFS features” or “it will have that ZFS feature in the future.”
Right now, I’d say ZFS is the gold standard and Btrfs is a up and comer that hopes to compete/is competing against ZFS. Actually for Btrfs, the proper comparison would be with a recent version of NTFS.
Once upon a time, I was a Slackware user.
The thing that Slackware user have is itches to go beyond what we had. In this case stock Slackware 8.1 ext3 filesystem I felt so terribly slow. So I went manual kernel compile and started to try other filesystem. At that time I chose XFS which I felt way faster than ext3 in the same harddisk drive.
Fast forward few years, ext4 become new standard. Not Slackware user anymore and not using the same hardware, I felt differences of ext4 and ext3. Ext4 is even faster than XFS in my experience and I have not pulling my hair out using ext4.
So my respect to ext
Once upon a time, I was a Slackware user.
The thing that Slackware user have is itches to go beyond what we had. In this case stock Slackware 8.1 ext3 filesystem I felt so terribly slow. So I went manual kernel compile and started to try other filesystem. At that time I chose XFS which I felt way faster than ext3 in the same harddisk drive.
Fast forward few years, ext4 become new standard. Not Slackware user anymore and not using the same hardware, I felt differences of ext4 and ext3. Ext4 is even faster than XFS in my experience and I have not pulling my hair out using ext4.
So my respect to ext4 developers.
Neither really.
Ext4 is the 4th generation of Ext and backwards compatible. NTFS doesn't advertise the generation in the name, but it's also at its 4th (maybe 5th, depending on who's deciding) generation and backwards compatible. Neither of them are forward compatible from old versions. Both originated at the same time periods (early 1990s) and both have kept up with adding features. If you compare the two feature by feature, it's pretty much a wash to say one is more advanced. Each has a few features unique, as well as drawbacks.
It's not really a useful comparison either. A Windows system is g
Neither really.
Ext4 is the 4th generation of Ext and backwards compatible. NTFS doesn't advertise the generation in the name, but it's also at its 4th (maybe 5th, depending on who's deciding) generation and backwards compatible. Neither of them are forward compatible from old versions. Both originated at the same time periods (early 1990s) and both have kept up with adding features. If you compare the two feature by feature, it's pretty much a wash to say one is more advanced. Each has a few features unique, as well as drawbacks.
It's not really a useful comparison either. A Windows system is going to use NTFS. A Lunix system is going to run Ext. Interoperability is abstracted, meh.
In my limited experience, no.
I’ve had no less than three power outages completely destroy an entire BTRFS partition, and despite being able to recover most of the data no repair tools -official or otherwise- were able to fix the filesystem, even when the corruption only affected a handful of files and restricted to the same subvolume.
Another time I had a problem with a bent SATA cable that caused one drive to randomly stall when anything would move it slightly out of place. One such time happened just in the middle of backing the drive up while debugging the issue, and the partition became un-
In my limited experience, no.
I’ve had no less than three power outages completely destroy an entire BTRFS partition, and despite being able to recover most of the data no repair tools -official or otherwise- were able to fix the filesystem, even when the corruption only affected a handful of files and restricted to the same subvolume.
Another time I had a problem with a bent SATA cable that caused one drive to randomly stall when anything would move it slightly out of place. One such time happened just in the middle of backing the drive up while debugging the issue, and the partition became un-unmountable. In the end had no choice but to do a hard reboot, resulting in the same type of filesystem corruption as mentioned before.
Bad cables aside, power outages are not a rare thing, especially on laptops. Having to plug in another storage device of the same or greater capacity, recovering the files by hand, reformatting the damaged filesystem, and copying the data back is, needless to say, extremely inconvenient, possibly expensive, definitely time-consuming, and a major drawback of BTRFS.
If you value your time and data, stay away from BTRFS, at least on production machines or until these issues have been fully solved. It promises lots of features, but they are too often half-baked or not performing well in practice.
XFS being the default filesystem in RHEL based distros and Ext4 being the standard filesystem in Debian and Ubuntu distributions.
Maximum supported size for Ext4 is 16TB in both Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6 and 50TB in Red Hat Enterprise Linux 7.
Ext4 perfectly manages many small files and ensures metadata is correctly written even write cache loses power.
XFS is indeed better than ext4 in many aspects. XFS support file system size upto 1PB.
Advantages:
XFS scales to exabytes, Red Hat’s maximum supported XFS file system image is 100TB for Red Hat Enterprise Linux 5, 300
XFS being the default filesystem in RHEL based distros and Ext4 being the standard filesystem in Debian and Ubuntu distributions.
Maximum supported size for Ext4 is 16TB in both Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6 and 50TB in Red Hat Enterprise Linux 7.
Ext4 perfectly manages many small files and ensures metadata is correctly written even write cache loses power.
XFS is indeed better than ext4 in many aspects. XFS support file system size upto 1PB.
Advantages:
XFS scales to exabytes, Red Hat’s maximum supported XFS file system image is 100TB for Red Hat Enterprise Linux 5, 300TB for Red Hat Enterprise Linux 6, and 500TB for Red Hat Enterprise Linux 7. XFS is routinely measured as one of the highest performing file systems on large systems with enterprise workloads. For instance, a large system would be one with a relatively high number of CPUs, multiple HBAs, and connections to external disk arrays. XFS also performs well on smaller systems that have a multi-threaded, parallel I/O workload. The XFS backup and recovery process can be interrupted and resumed without freezing the file system.
Disadvantages:
XFS has a relatively poor performance for single threaded, metadata-intensive workloads, for example, a workload that creates or deletes large numbers of small files in a single thread. Finally, you cannot shrink (reduce) XFS file systems in size, so you should take extra care to not over-allocate storage to an existing file system.
If you want to know more, see here
This may come down to semantics to some degree.
When you do a directory listing on ext4, you will get a list of filenames. Each of these filenames (ignoring symlinks for now) is associated with an inode on disk, which is a piece of metadata which describes the attributes of that file - its size, its timestamps, the location of its data blocks, etc.
As an example let’s get a directory listing for a directory with a single file in it, named “a”:
- # strace -v -e getdents ls testdir
- getdents(3, {{d_ino=2756063, d_off=3966251527637366025, d_reclen=24, d_name="."} {d_ino=2755343, d_off=484355910627326653
This may come down to semantics to some degree.
When you do a directory listing on ext4, you will get a list of filenames. Each of these filenames (ignoring symlinks for now) is associated with an inode on disk, which is a piece of metadata which describes the attributes of that file - its size, its timestamps, the location of its data blocks, etc.
As an example let’s get a directory listing for a directory with a single file in it, named “a”:
- # strace -v -e getdents ls testdir
- getdents(3, {{d_ino=2756063, d_off=3966251527637366025, d_reclen=24, d_name="."} {d_ino=2755343, d_off=4843559106273266539, d_reclen=24, d_name="a"} {d_ino=2752513, d_off=9223372036854775807, d_reclen=24, d_name=".."}}, 32768) = 72
We can see that the file “a” has inode number 2755343. If we stat that file, we get its inode number and some of its other metadata, all of which is stored in the inode:
- # stat testdir/a
- File: `testdir/a'
- Size: 4 Blocks: 8 IO Block: 4096 regular file
- Device: fd00h/64768d Inode: 2755343 Links: 1
- Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
- Access: 2020-03-10 09:50:44.388028747 -0500
- Modify: 2020-03-10 09:52:59.659817195 -0500
- Change: 2020-03-10 09:52:59.659817195 -0500
If we want to know where that inode’s actual data lives, we can ask for that information, which is also found via the inode’s metadata:
- [root@bp-05 ~]# filefrag -v testdir/a
- Filesystem type is: ef53
- File size of testdir/a is 4 (1 block, blocksize 4096)
- ext logical physical expected length flags
- 0 0 11436385 1 eof
- testdir/a: 1 extent found
In any case, for every file a user sees in a directory listing, there is ultimately one inode on disk which contains all of the information about that file, and pointers to that file’s data on disk. I’ll leave it to you to decide whether or not that means they are “the same.
Inode are allocated when admin/user does mkfs to format the file system. With ext3 for example,You can also specify the inode count using mkfs.ext3 -N option.You can view the current options via tune2fs command.
Can you please provide a comment or reference as to how they are inaccurate? Please update with a comment on what specific distribution (and version if you know that as well) and I’ll provide a more detailed response.
It has been a while since I set quotas per user, so I did some generic digging and I found this in the redhat documentation for their quota management:
The hard block limit is the absolute maximum amount of disk space that a user or group can use. Once this limit is reached, no further disk space can be used.
The soft block limit defines the maximum amount of disk space that can be
Can you please provide a comment or reference as to how they are inaccurate? Please update with a comment on what specific distribution (and version if you know that as well) and I’ll provide a more detailed response.
It has been a while since I set quotas per user, so I did some generic digging and I found this in the redhat documentation for their quota management:
The hard block limit is the absolute maximum amount of disk space that a user or group can use. Once this limit is reached, no further disk space can be used.
The soft block limit defines the maximum amount of disk space that can be used. However, unlike the hard limit, the soft limit can be exceeded for a certain amount of time. That time is known as the grace period. The grace period can be expressed in seconds, minutes, hours, days, weeks, or months.
Another thing to note is that their quota management is done PER BLOCK not per bytes.
So if you are talking about quotas on a redhat based system (Fedora, CentOS, and others), this is likely where the confusion comes from. If you are talking another distro or another filesystem type, again, just provide me some context and I’ll help if I can.
A file system hierarchy is a set of directories (and perhaps a set of rules for what goes where in it). A file system is how you store it.
Storage devices, like hard drives, are just a long long series of bytes you can read and write; they have absolutely no structure on their own. A file system is a set of rules for where you start reading, how you store “a directory” (byte for byte), how you say that a file is in a directory, how you specify where on the disk the contents of that file are, and how you specify what kind of a file it is. (A directory is typically just a special kind of file, wh
A file system hierarchy is a set of directories (and perhaps a set of rules for what goes where in it). A file system is how you store it.
Storage devices, like hard drives, are just a long long series of bytes you can read and write; they have absolutely no structure on their own. A file system is a set of rules for where you start reading, how you store “a directory” (byte for byte), how you say that a file is in a directory, how you specify where on the disk the contents of that file are, and how you specify what kind of a file it is. (A directory is typically just a special kind of file, which makes it a bit easier to implement.)
This means that the two are independent: You can implement a linux file system hierarchy on any file system (or even on paper, if you’re just deciding how you lay things out in your new distro) — and you can put whatever directory structure you want on a file system.
This ext4 feature allows the mapping of logical block numbers for a particular inode to physical blocks on the storage device to be stored using an extent tree, which is a more efficient data structure than the traditional indirect block scheme used by the ext2 and ext3 file systems.
Q: What is the maximum file size in EXT4? Why does it have that limit?
16 Terabytes in a single file, 1 Exabyte in a single volume - ext4 - Wikipedia
Realistically though there are a lot of limitations you’ll hit in the operating system before you can create a file that large. The EXT4 filesystem and its integration into linux in a vanilla installation and kernel aren’t sized and tuned to work with files of this size out of the box. Not saying it can’t be done, but you’ll have to do a lot of tuning before you can actually save a file that large.
As for why it has this limit, the way the filesyste
Q: What is the maximum file size in EXT4? Why does it have that limit?
16 Terabytes in a single file, 1 Exabyte in a single volume - ext4 - Wikipedia
Realistically though there are a lot of limitations you’ll hit in the operating system before you can create a file that large. The EXT4 filesystem and its integration into linux in a vanilla installation and kernel aren’t sized and tuned to work with files of this size out of the box. Not saying it can’t be done, but you’ll have to do a lot of tuning before you can actually save a file that large.
As for why it has this limit, the way the filesystem is constructed there is a basically a map of where all of the parts on your storage that constitutes the file, and this map has a finite number of entries. The same is true of directories, there is a finite number of entries that you can put in a directory, which translates to a maximum number of files you can put into a directory.
Regarding the system limitations I mentioned, there’s a lot of additional components that go into storing your data under a modern linux based operating system, and filesystem, for example device mapper, logical volume manager, the quota system, and so on. These also have to be configured in order to properly work with large files and large directories.
If you want snapshots, on the fly compression, data integrity checks and subvolumes then go with btrfs. If you are running a large database or virtual machines you will need to make some settings changes for the directories homing them or use xfs instead. Ext4 is basically obsolete, replaced by btrfs and xfs in the main Linux distributions as the default file system.
ZFS and BTRFS are both b-tree file systems. Ext is just an old style fs that has been superceded by modern alternatives.
Here’s an extremely simplified diagram of a file system.
So, you have a directory entry that contains a name and an inode number.
That inode number is used as an index into the inode table, to find a particular inode. The inode contains at least a reference count, and some sort of pointer to storage blocks that hold the data for the file.
The reference count in the inode tracks the number of directory entries pointing to that inode. When you unlink a file, that doesn’t directly delete the file. Rather, it decrements the reference count in the inode, and removes the directory entry.
When/if the ino
Here’s an extremely simplified diagram of a file system.
So, you have a directory entry that contains a name and an inode number.
That inode number is used as an index into the inode table, to find a particular inode. The inode contains at least a reference count, and some sort of pointer to storage blocks that hold the data for the file.
The reference count in the inode tracks the number of directory entries pointing to that inode. When you unlink a file, that doesn’t directly delete the file. Rather, it decrements the reference count in the inode, and removes the directory entry.
When/if the inode’s reference count reaches zero, that means there’s no longer any directory entry pointing to the inode, so the storage consumed by that file can be released for reuse.
This means you can have a (nearly) arbitrary number of directory entries pointing to the same inode, so the same data can appear under different names, and/or in different directories, at the same time, while consuming essentially no extra storage space (only the space for a directory entry).
I’ll repeat one point for emphasis though: this is an extremely simplified view of things.
“a hard drive that stores files only”
As opposed to what other things? Peanuts?
Here’s a tip, all things on a computer, especially anything in a Unix/Linux computer (since you specify EXT4), are “files”. Even on other systems, “files” are what you store on a hard drive. You’d have to go way out of your way to do anything else with a hard drive, and arguably end up with just a different form of “file” (e.g. use a raw disk as a database engine’s storage location).
I’m assuming you mean, just data files. No programs, no system-, settings, cache-, temporary-, etc. files, nothing else like that. Just
“a hard drive that stores files only”
As opposed to what other things? Peanuts?
Here’s a tip, all things on a computer, especially anything in a Unix/Linux computer (since you specify EXT4), are “files”. Even on other systems, “files” are what you store on a hard drive. You’d have to go way out of your way to do anything else with a hard drive, and arguably end up with just a different form of “file” (e.g. use a raw disk as a database engine’s storage location).
I’m assuming you mean, just data files. No programs, no system-, settings, cache-, temporary-, etc. files, nothing else like that. Just things like pictures, documents, audio, video, etc.
In that case: Could I ask a follow-up question?
How are you using these files?
I mean, are they just sitting there? Perhaps one year later, you may want to open and view one. But, other than that, they never change?
Do you add new files periodically? Do you change existing ones? How often?
What about the organization of the files? All in one folder? Or do you have multiple folders and subfolders to group files into some form of “category” for your own ease of finding them later?
All that can combine to modify an answer here between some 50% of the total drive space needs to be free, else you start seeing performance degradation. Up to fill it to the brim, no difference. And every level in between.
Per
“Extents reduce the amount of metadata needed to keep track of the data blocks for large files. Instead of storing a list of every individual block which makes up the file, the idea is to store just the address of the first and last block of each continuous range of blocks. These continuous ranges of data blocks (and the pairs of numbers which represent them) are called extents.“