Skip to content

Is 8% size inflation cause for concern when copying directories? #156

@chconnor

Description

@chconnor

Hi --

I copied about 89GB from one ntfs partition to another ntfs partition (in Debian bookworm). Old directory was a windows C:\ drive, new drive was a linux-formatted ntfs partition.

The destination is 98GB in size (about 10% larger). I was trying to understand the difference in size. I have not compared file-by-file to ensure an identical copy, but since the size is growing (not shrinking) I assume some filesystem effect explains this.

  • There is no compression on the source drive
  • The block size is the same on both partitions
  • I have tried cp --sparse=always: this makes the inflation only about 7.8%, so some of it was sparse files, but not most of it
  • rsync -rlptgDX yields the same result
  • no "ADS" files (assuming I verified this correctly)

I have read about resident files, and I'm sure there are several other subtle filesystem mechanisms that could explain this, I was just surprised that 7 or 8% inflation was still possible.

Is this in the range of "it's normal -- don't worry about it", or is there something wrong with my partitions, or my mounting?

The destination is mounted like this (fstab):

UUID=02E336DB0BF737BD /dest ntfs-3g defaults,windows_names,umask=006,gid=46,uid=1000 0 2

The source is mounted with default options (mount /dev/sdd1 /dest).

Thanks for any insight!

Activity

unsound

unsound commented on Jun 27, 2025

@unsound
Member

There are a couple of possible explanations. If the cluster size is the same (assuming that's what you mean by "block size"), then it could also be that the MFT record size differs and thus how much data can be stored resident in MFT records.
It could also be that some of the source files use Windows System Compression.
I would encourage you to try to find a specific file that is smaller on the source volume than on the destination volume and collect ntfsinfo -v -F <path to file> <path to device> output for both files. That should reveal why there is a difference.

chconnor

chconnor commented on Jun 28, 2025

@chconnor
Author

Thanks!

Yes, cluster size, thanks: both are 4096.

From ntfsinfo, "MFT Record Size" in both cases is 1024.

Re: Windows System Compression -- is that what shows up in system.ntfs_attrib_be as explained in this answer? If so, I searched all the subdirectories (using a find -exec) and did not find any that had anything but 0 in the bit position that overlaps 0x0800.

...but I think that's a misunderstanding on my part, because when I search for files with mismatching sizes, as you suggested, I see files that seem to be compressed, if I'm interpreting the ntfsinfo results at all accurately.

I'm using this monstrosity:

find . -type f -exec bash -c 'SRCS=`du -s "{}" | awk "{print \\$1}"`; DSTS=`du -s /dest/"{}" | awk "{print \\$1}"`; if [ $SRCS -ne $DSTS ]; then echo $SRCS $DSTS "{}"; fi' \;

...to search for files with different on-disk sizes. There are very many.

I notice that almost all the differences are when the source is smaller than the destination, which is the opposite of what I expected, since the destination is the one that is overall 8% larger, but I didn't confirm the breakdown for all 500k files.

Anyway, when I find files that are larger on the destination, I see that ARCHIVE COMPRESSED NOT_CONTENT_INDEXED (0x00002820) is on the source but not the destination, and that "Resident flags" is 0x00 on the destination and 0x01 on the source.

is it safe to assume that's all that is happening? I don't want to drag you through getting to the bottom of every file... if 8% is within reason for a C: drive I'm happy to not worry about it. :-)

Attached is a source/destination pair, but of course such comparison differ a lot depending on the file. This is one where the destination was larger.

src.txt

dest.txt

unsound

unsound commented on Jun 28, 2025

@unsound
Member

The presence of compression in the source is exactly why there is a size difference, so that explains it. The data attribute is non-resident in both cases but occupy 3 clusters in the source and 8 clusters in the destination. So I will close this issue as I think we've found the explanation. :)

chconnor

chconnor commented on Jun 28, 2025

@chconnor
Author

Thanks for your time on it. As long as there is nothing of concern in the inflation percentage, I'm not concerned. Thanks for the software!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @unsound@chconnor

        Issue actions

          Is 8% size inflation cause for concern when copying directories? · Issue #156 · tuxera/ntfs-3g