What are the file limits in Git (number and size)?

Question

Does anyone know what are the Git limits for number of files and size of files?

VonC · Accepted Answer · 2015-04-09 06:35:06Z

This message from Linus himself can help you with some other limits

[...] CVS, ie it really ends up being pretty much oriented to a "one file at a time" model.

Which is nice in that you can have a million files, and then only check out a few of them - you'll never even see the impact of the other 999,995 files.

Git fundamentally never really looks at less than the whole repo. Even if you limit things a bit (ie check out just a portion, or have the history go back just a bit), git ends up still always caring about the whole thing, and carrying the knowledge around.

So git scales really badly if you force it to look at everything as one huge repository. I don't think that part is really fixable, although we can probably improve on it.

And yes, then there's the "big file" issues. I really don't know what to do about huge files. We suck at them, I know.

See more in my other answer: the limit with Git is that each repository must represent a "coherent set of files", the "all system" in itself (you can not tag "part of a repository").
If your system is made of autonomous (but inter-dependent) parts, you must use submodules.

As illustrated by Talljoe's answer, the limit can be a system one (large number of files), but if you do understand the nature of Git (about data coherency represented by its SHA-1 keys), you will realize the true "limit" is a usage one: i.e, you should not try to store everything in a Git repository, unless you are prepared to always get or tag everything back. For some large projects, it would make no sense.

For a more in-depth look at git limits, see "git with large files"
(which mentions git-lfs: a solution to store large files outside the git repo. GitHub, April 2015)

The three issues that limits a git repo:

huge files (the xdelta for packfile is in memory only, which isn't good with large files)
huge number of files, which means, one file per blob, and slow git gc to generate one packfile at a time.
huge packfiles, with a packfile index inefficient to retrieve data from the (huge) packfile.

A more recent thread (Feb. 2015) illustrates the limiting factors for a Git repo:

Will a few simultaneous clones from the central server also slow down other concurrent operations for other users?

There are no locks in server when cloning, so in theory cloning does not affect other operations. Cloning can use lots of memory though (and a lot of cpu unless you turn on reachability bitmap feature, which you should).

Will 'git pull' be slow?

If we exclude the server side, the size of your tree is the main factor, but your 25k files should be fine (linux has 48k files).

'git push'?

This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick..

Ah the number of refs may affect both git-push and git-pull.
I think Stefan knows better than I in this area.

'git commit'? (It is listed as slow in reference 3.) 'git status'? (Slow again in reference 3 though I don't see it.)
(also git-add)

Again, the size of your tree. At your repo's size, I don't think you need to worry about it.

Some operations might not seem to be day-to-day but if they are called frequently by the web front-end to GitLab/Stash/GitHub etc then they can become bottlenecks. (e.g. 'git branch --contains' seems terribly adversely affected by large numbers of branches.)

git-blame could be slow when a file is modified a lot.

documentation for git submodules = book.git-scm.com/5_submodules.html — Alexander Bird, Feb 5 '10 at 2:21
@Thr4wn: see also stackoverflow.com/questions/1979167/git-submodule-update/… for more on the GitPro submodule page. For a shorter version: stackoverflow.com/questions/2065559/… — VonC, Feb 5 '10 at 4:59
Updated link for git submoules documentation = git-scm.com/book/en/Git-Tools-Submodules — JHowIX, Jan 14 '14 at 20:49
@JHowIX true: I have added that link in the answer for more visibility. — VonC, Jan 14 '14 at 20:51
I really wonder, with so much of sqlite and many database alternatives available on linux, why they couldn't simply use database which is easy to backup, replicate and scale. — Akash Kava, May 13 '14 at 18:43

Brian Carlton · Answer 2 · 2011-03-02 21:05:16Z

up vote 27 down vote

If you add files that are too large (GBs in my case, Cygwin, XP, 3 GB RAM), expect this.

fatal: Out of memory, malloc failed

More details here

Update 3/2/11: Saw similar in Windows 7 x64 with Tortoise Git. Tons of memory used, very very slow system response.

edited Mar 2 '11 at 21:05

answered Feb 4 '10 at 1:58

Brian Carlton

4,13842550

I have the same issue on a 512MB Fedora git server. :S – Cheng Mar 23 '11 at 4:20

add a comment |

Talljoe · Answer 3 · 2009-06-12 02:48:44Z

There is no real limit -- everything is named with a 160-bit name. The size of the file must be representable in a 64 bit number so no real limit there either.

There is a practical limit, though. I have a repository that's ~8GB with >880,000 and git gc takes a while. The working tree is rather large so operations that inspect then entire working directory take quite a while. This repo is only used for data storage, though, so it's just a bunch of automated tools that handle it. Pulling changes from the repo is much, much faster than rsyncing the same data.

%find . -type f | wc -l
791887
%time git add .
git add .  6.48s user 13.53s system 55% cpu 36.121 total
%time git status
# On branch master
nothing to commit (working directory clean)
git status  0.00s user 0.01s system 0% cpu 47.169 total
%du -sh .
29G     .
%cd .git
%du -sh .
7.9G    .

Although there is a "more correct" answer above talking about the theoretical limitations, this answer seems more helpful to me as it allows to compare the own situation with yours. Thanks. — Bananeweizen, May 9 '12 at 12:35

CharlesB · Answer 4 · 2013-10-21 12:39:08Z

Back in Feb 2012, there was a very interesting thread on the Git mailing list from Joshua Redstone, a Facebook software engineer testing Git on a huge test repository:

The test repo has 4 million commits, linear history and about 1.3 million files.

Tests that were run show that for such a repo Git is unusable (cold operation lasting minutes), but this may change in the future. Basically the performance is penalized by the number of stat() calls to the kernel FS module, so it will depend on the number of files in the repo, and the FS caching efficiency. See also this Gist for further discussion.

+1 Interesting. That echoes my own answers about git limits detailing the limitations on huge files/number of files/packfiles. — VonC, Oct 21 '13 at 13:28

Dustin · Answer 5 · 2009-06-12 02:43:15Z

It depends on what your meaning is. There are practical size limits (if you have a lot of big files, it can get boringly slow). If you have a lot of files, scans can also get slow.

There aren't really inherent limits to the model, though. You can certainly use it poorly and be miserable.

Kzqai · Answer 6 · 2010-02-04 21:01:00Z

I think that it's good to try to avoid large file commits as being part of the repository (e.g. a database dump might be better off elsewhere), but if one considers the size of the kernel in its repository, you can probably expect to work comfortably with anything smaller in size and less complex than that.

funwhilelost · Answer 7 · 2012-02-21 02:16:29Z

I have a generous amount of data that's stored in my repo as individual JSON fragments. There's about 75,000 files sitting under a few directories and it's not really detrimental to performance.

Checking them in the first time was, obviously, a little slow.

Kasisnu · Answer 8 · 2015-01-24 17:44:43Z

I found this trying to store a massive number of files(350k+) in a repo. Yes, store. Laughs.

$ time git add . 
git add . 333.67s user 244.26s system 14% cpu 1:06:48.63 total

The following extracts from the Bitbucket documentation are quite interesting.

When you work with a DVCS repository cloning, pushing, you are working with the entire repository and all of its history. In practice, once your repository gets larger than 500MB, you might start seeing issues.

... 94% of Bitbucket customers have repositories that are under 500MB. Both the Linux Kernel and Android are under 900MB.

The recommended solution on that page is to split your project into smaller chunks.

Michael Hu · Answer 9 · 2012-06-15 01:34:05Z

up vote -8 down vote

git has a 4G (32bit) limit for repo.

http://code.google.com/p/support/wiki/GitFAQ

answered Jun 15 '12 at 1:34

Michael Hu

7

18

That looks like google limit, not git limit. – Doncho Gunchev Jun 15 '12 at 2:12

add a comment |

asked	7 years ago
viewed	79015 times
active	1 year ago

What are the file limits in Git (number and size)?

9 Answers 9

Your Answer

Not the answer you're looking for? Browse other questions tagged git or ask your own question.

Linked

Hot Network Questions

What are the file limits in Git (number and size)?

9 Answers 9

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged git or ask your own question.

Linked

Related

Hot Network Questions