The largest Git repo on the planet

05/24/2017 by Brian Harry MS // 59 Comments

It’s been 3 months since I first wrote about our efforts to scale Git to extremely large projects and teams with an effort we called “Git Virtual File System”. As a reminder, GVFS, together with a set of enhancements to Git, enables Git to scale to VERY large repos by virtualizing both the .git folder and the working directory. Rather than download the entire repo and checkout all the files, it dynamically downloads only the portions you need based on what you use.

A lot has happened and I wanted to give you an update. Three months ago, GVFS was still a dream. I don’t mean it didn’t exist – we had a concrete implementation, but rather, it was unproven. We had validated on some big repos but we hadn’t rolled it out to any meaningful number of engineers so we had only conviction that it was going to work. Now we have proof.

Today, I want to share our results. In addition, we’re announcing the next steps in our GVFS journey for customers, including expanded open sourcing to start taking contributions and improving how it works for us at Microsoft, as well as for partners and customers.

Windows is live on Git

Over the past 3 months, we have largely completed the rollout of Git/GVFS to the Windows team at Microsoft.

As a refresher, the Windows code base is approximately 3.5M files and, when checked in to a Git repo, results in a repo of about 300GB. Further, the Windows team is about 4,000 engineers and the engineering system produces 1,760 daily “lab builds” across 440 branches in addition to thousands of pull request validation builds. All 3 of the dimensions (file count, repo size and activity), independently, provide daunting scaling challenges and taken together they make it unbelievably challenging to create a great experience. Before the move to Git, in Source Depot, it was spread across 40+ depots and we had a tool to manage operations that spanned them.

As of my writing 3 months ago, we had all the code in one Git repo, a few hundred engineers using it and a small fraction (<10%) of the daily build load. Since then, we have rolled out in waves across the engineering team.

The first, and largest, jump happened on March 22nd when we rolled out to the Windows OneCore team of about 2,000 engineers. Those 2,000 engineers worked in Source Depot on Friday, went home for the weekend and came back Monday morning to a new experience based on Git. People on my team were holding their breath that whole weekend, praying we weren’t going be pummeled by a mob of angry engineers who showed up Monday unable to get any work done. In truth, the Windows team had done a great job preparing backup plans in case of mishap and, thankfully, we didn’t have to use any of them.

Much to my surprise, quite honestly, it went very smoothly and engineers were productive from day one. We had some issues, no doubt. For instance, Windows, because of the size of the team and the nature of the work, often has VERY large merges across branches (10,000’s of changes with 1,000’s of conflicts). We discovered that first week that our UI for pull requests and merge conflict resolution simply didn’t scale to changes that large. We had to scramble to virtualize lists and incrementally fetch data so the UI didn’t just hang. We had it resolved within a couple of days and overall, sentiment that week was much better than we expected.

One of the ways we measured our success was by doing surveys of the engineering team. The main question we asked was “How satisfied are you?” but, of course, we also mined a lot more detail. Two weeks into the rollout, our first survey resulted in:

I’m not going to jump up and down and celebrate those numbers, but for a team that had just had their whole life changed, had to learn a new way of working and were living through a transition that was very much a work in progress, I felt reasonably good about it. Yes, it’s only 251 survey responses out of 2,000 people but welcome to the world of trying to get people to respond to surveys. 🙂

Another way we measured success was to look at “engineering activity” to see if people were still getting their work done. For instance, we measured number of “checkins” to official branches. Of course, half the team was still on Source Depot and half had moved to Git so we looked at combined activity over time. In the chart below you can see the big drop in Source Depot checkins and the big jump in Git pull requests but overall the sum of the two stayed reasonable consistent. We felt that the data showed that the system was working and there were no major blockers.

On April 22nd, we onboarded the next wave of about 1,000 engineers. And then on May 12th we onboarded another 300-400. Each successive wave followed roughly the same pattern and we now have about 3,500 of the roughly 4,000 Windows engineers on Git. The remaining teams are currently working to deadlines and trying to figure out when is the best time to schedule their move, but I expect, in the next few months we’ll complete the full engineering team.

The scale the system is operating at is really amazing. Let’s look at some numbers…

There are over 250,000 reachable Git commits in the history for this repo, over the past 4 months.
8,421 pushes per day (on average)
2,500 pull requests, with 6,600 reviewers per work day (on average)
4,352 active topic branches
1,760 official builds per day

As you can see, it’s just a tremendous amount of activity over an immensely large codebase.

GVFS performance at scale

If you look at those satisfaction survey numbers, you’ll see there are people who aren’t happy yet. We have lots of data on why and there are many reasons – from tooling that didn’t support Git yet to frustration at having to learn something new. But, the top issue is performance, and I want to drill into that. We knew when we rolled out Git that lots of our performance work wasn’t done yet and we also learned some new things along the way. We track the performance of some of the key Git operations. Here is data collected by telemetry systems for the ~3,500 engineers using GVFS.

You see the “goal” (which was designed to be a worst case, the system isn’t usable if it’s slower than this value, not a “this is where we want to be” value). You also see the 80th percentile result for the past 7 days and the delta from the previous 7 days (you’ll notice everything is getting slower – more on that in a minute).

For context, if we tried this with “vanilla Git”, before we started our work, many of the commands would take 30 minutes up to hours and a few would never complete. The fact that most of them are less than 20 seconds is a huge step but it still sucks if you have to wait 10-15 seconds for everything.

When we first rolled it out, the results were much better. That’s been one of our key learnings. If you read my post that introduced GVFS, you’ll see I talked about how we did work in Git and GVFS to change many operations from being proportional to the number of files in the repo to instead be proportional to the number of files “read”. It turns out that, over time, engineers crawl across the code base and touch more and more stuff leading to a problem we call “over hydration”. Basically, you end up with a bunch of files that were touched at some point but aren’t really used any longer and certainly never modified. This leads to a gradual degradation in performance. Individuals can “clean up” their enlistment but that’s a hassle and people don’t, so the system gets slower and slower.

That led us to embark upon another round of performance improvements we call “O(modified)” which changes the proportionality of many key commands to instead be proportional to the number of files I’ve modified (meaning I have current, uncommitted edits on). We are rolling these changes out to the org over the next week so I don’t have broad statistical data on the results yet but we do have good results from some early pilot users.

I don’t have all the data but I’ve picked a few examples from the table above and copied the performance results into the column called “O(hydrated)”. I’ve added another column called O(modified) with the results for the same commands using the performance enhancements we are rolling out next week. All the numbers are in seconds. As you can see we are getting performance improvements across the board – some are small, some are ~2X and status is almost 5X faster. We’re very optimistic these improvements are going to move the needle on perf perception. I’m still not fully satisfied (I won’t be until Status is under 1 second), but it’s fantastic progress.

Another key performance area that I didn’t talk about in my last post is distributed teams. Windows has engineers scattered all over the globe – the US, Europe, the Middle East, India, China, etc. Pulling large amounts of data across very long distances, often over less than ideal bandwidth is a big problem. To tackle this problem, we invested in building a Git proxy solution for GVFS that allows us to cache Git data “at the edge”. We have also used proxies to offload very high volume traffic (like build servers) from the main Visual Studio Team Services service to avoid compromising end user’s experiences during peak loads. Overall, we have 20 Git proxies (which, BTW, we’ve just incorporated into the existing Team Foundation Server Proxy) scattered around the world.

To give you an idea of the effect, let me give an example. The Windows Team Services account is located in an Azure data center on the west coast of the US. Above you saw that the 80th percentile for Clone for a Windows engineer is 127 seconds. Since a high percentage of our Windows engineers are in Redmond, that number is dominated by them. We ran a test from our North Carolina office (which is both further away and has a much lower bandwidth network). A clone from North Carolina with no proxy server took almost 25 minutes. With a proxy configured and up to date, it took 70 seconds (faster than Redmond because the Redmond team doesn’t use a proxy and they have to go hundreds of miles over the internet to the Azure data center). 70 seconds vs almost 25 minutes is an almost 95% improvement. We see similar improvements when GVFS “faults in” files as they are accessed.

Overall Git with GVFS is completely usable at crazy large scale and the results are proving that our engineers are effective. At the same time, we have a lot of work to do to get the performance to the point that our engineers are “happy” with it. The O(modified) work rolling out next week will be a big step but we have months of additional performance work still on the backlog before we can say we’re done.

To learn more about the details of the technical challenges we’ve faced in scaling Git and getting good performance, check out the series of articles that Saeed Noursalehi is writing on scaling Git and GVFS. It’s fascinating to read.

Trying GVFS yourself

GVFS is an open source project and you are welcome to try it out. All you need to do is download and install it, create a Visual Studio Team Services account with a Git repo in it and you are ready to go. Since we initially published GVFS, we’ve made some good progress. Some of the key changes include:

We’ve started doing regular updates to the published code base – moving towards “development in the open”. As of now, all our latest changes (including the new O(modified) work) are published to the public repo and we will be updating it regularly.
When we first published, we were not ready to start taking external contributions. With this milestone today, we are now, officially ready to start. We feel like enough of the basic infrastructure is in place that people can start picking it up and moving it forward with us. We welcome anyone who wants to pitch in and help.
GVFS relies on a Windows filesystem driver we call GVFlt. Until now, the drop of that driver that we made available was unsigned (because it was very much a work in progress). That clearly creates some friction in trying it out. Today, we released a signed version of GVFlt that will eliminate that friction (for instance, you no longer need to disable BitLocker to install it). Although we have a signed GVFlt driver, that’s not the long term delivery method. We expect this functionality to be incorporated into a future shipping version of Windows and we are still working through those details.
Starting with our talk at Git Merge, we’ve begun engaging with the broader Git community on the problem of scaling Git and GVFS, in particular. We’ve had some great conversations with other large tech companies (like Google and Facebook) who have similar scaling challenges and we are sharing our experiences and approaches. We have also worked with several of the popular Git clients to make sure they work well with GVFS. These include:
1. Atlassian SourceTree – SourceTree was the first tool to validate with GVFS and have already released an update with a few changes to make it work well.
2. Tower – The Tower Git team is excited to add GVFS support and they are already working on include GVFS in the Windows version of their app. It will be available as a free update in the near future.
3. Visual Studio – Of course, it would be good for our own Visual Studio Git integration to work well with GVFS too. We are including GVFS support in VS 2017.3 and the first preview with the necessary support will be available in early June.
4. Git for Windows – As part of our effort to scale Git, we have also made a bunch of contributions to Git for Windows (the Git command line) and that includes support for GVFS. Right now, we still have a private fork of Git for Windows but, over time, we are working to get all of those changes contributed back to the mainline.

Summary

We’re continuing to push hard on scaling Git to large teams and code bases at Microsoft. A lot has happened in the 3 months since we first talked about the effort. We’ve…

Successfully rolled it out to 3,500 Windows engineers
Made some significant performance improvements and introduced Git proxies
Updated the open source projects with the latest code and opened it for contributions
Provided a signed GVFlt driver to make trying it out easier
Worked with the community to begin to build support into popular tools – like SourceTree, Tower, Visual Studio, etc.
Published some articles with more insights into the technical approach we are taking to scale Git and GVFS.

This is an exciting transition for Microsoft and a challenging project for my team and the Windows team. I’m elated at the progress we’ve made and humbled by the work that remains. If you too find there are times where you need to work with very large codebases and, yet you really you really want to move to Git, I encourage you to give GVFS a try. For now, Visual Studio Team Services is the only backend implementation that supports the GVFS protocol enhancements. We will add support in a future release of Team Foundation Server if we see enough interest and we have talked to other Git services who have some interest in adding support in the future.

Thanks and enjoy.

Brian

Back to
top

Tags

Join the conversation

Add Comment

Cancel

2 weeks ago

Mike-EEE

Wow… your farming posts are fun, Brian. But this is where you truly earn your bacon. Guess I should pay attention whether you run a pig farm or not. 😛 In any case, incredible post.
2 weeks ago

Thomas Ricker

WOW!

JUST WOW!

MIND BLOWN.
2 weeks ago

Bartosz

Amazing, nice work! But how current GVFS performance compares to previous solution (Source Depot)?
2 weeks ago

Sam Atwell

Are you guys talking to other Git client developers than the ones you mentioned? Tooling is one of the more important tools for something to be picked up everyone.

I am thinking specifically of GitHub Desktop, SmartGit and TortoiseGit as they are probably the 3 biggest Git clients (other than the ones you already mentioned).
2 weeks ago

Brian Harry MS

@Bartosz, We have. It depends a great deal on the operation. SourceDepot was much faster at some things – like “sd opened”, the equivalent of “git status”. sd opened was < .5s. git status is at 2.6s now. But SD was much slower at some other things – like branching. Creating a branch in SD would take hours. In Git, it's less than a minute. I saw a mail from one of our engineers at one point saying they'd been putting off doing a big refactoring for 9 months because the branch mechanics in SD would have been so cumbersome and after the switch to Git they were able to get the whole refactoring done in a topic branch in no time.

On an operation, by operation basis, SD is still much faster than our Git/GVFS solution. We're still working on it to close the gap but I'm not sure it will ever get as fast at everything. The broader question, though is about overall developer productivity and we think we are on a path to winning that.

Brian
2 weeks ago

Brian Harry MS

@Sam, yes, we are working with a bunch of Git clients. I focused on the ones that have made good progress. There are others that are very interested but aren’t close to having something and others that are waiting to see how much interest there is. To drive some of this, the developers of the world will have to put in their own vote.

Brian
2 weeks ago

Jeremy

Really cool work.

Will there ever be a discussion on what the Git branching / release strategy is for Windows? I think a lot of people coming from the monolithic enterprise world (especially those of us dealing with multiple parallel release streams) struggle with how to model such workflows in Git.
2 weeks ago

Zach

@sam, I’ve been doing this for 10 years and have yet to even hear of smartgit and tortoisegit.
2 weeks ago

Brian Harry MS

@Jeremy, I’d be happy to write something at some point. I’ll give a short summary here. Windows is still in transition from the “old” long term branch proliferation model to a newer solution with many fewer branches and work happening much closer to master. That said, they are likely too big of an org to ever get too one branch. We’ll see. So they still have branches off master for each of the “major” Windows orgs and a schedule for managing code motion between them The result is about 400 long term branches. That’s a big reduction from where they were at one point but not where they want to be.

We had to decouple the process to evolve branching structure and code flow from the migration to Git so we are using more long term branches than a Git team normally would.

My team (the Visual Studio Team Services/Team Foundation Server team) is much smaller (hundreds of engineers rather than thousands) works in a model that is closer to what I’d recommend as a north star for most teams. All works happens as close to master as possible and we have no (or very few – count them on 1 hand) long term development branches other than master. People create short lived topic branches, do their work and merge to master quickly. We “branch for release” by creating a new branch for each release we ship and all the servicing work for the release happens in the release branch.

Brian
2 weeks ago

Ben the Builder

Any chance I could do a pull request 🙂
2 weeks ago

TJ

# Ahh. This makes sense.
#This. But in reverse – sorry, i’ve been at it too long but thanks!
2 weeks ago

Peter Dave Hello

Hi Brian,

Thanks for this cool experience sharing, I’m also experiencing git performance issue on cdnjs* project, which has about a 88GB large repository(3.5G .git directory), 40k commits, 3.9M files and 0.3M directories, we use large memory, fast processor, SSD and git sparseCheckout in different scenarios to speed up our git operations, but it’s still not a happy speed like the “git status” will still take 30 seconds(slower the first time if there is no cache for filesystem), so it’s very excited to see the GVFS work from Microsoft, which may be the potential solution to our very same problem, it’ll be great if you guys will consider supporting GVFS on unix-like systems like FreeBSD or Linux distributions, thanks again!

https://cdnjs.com
https://github.com/cdnjs/cdnjs
2 weeks ago

Jay

How often do you anticipate having to delete the current git repo and downloading a fresh copy?
2 weeks ago

Gurinder

In a previous MS article I read, I was under the impression the Windows team was using Team Services (online) + VS2017. Is that the “…UI for pull requests and merge conflict resolution…” you were referring to or did you have to build a custom UI to handle GVFS and the scale of your pulls/merges? I ask because when you consider Git at scale you also need to consider git tools at scale – can they handle them and can the developer effectively work with the data being shown to them as a result of git’s data model.
2 weeks ago

Allan

Hi Brian,

Great write-up — thanks for the insights. Do you have a sense of when GitHub support will land?

Allan
2 weeks ago

Brian Harry MS

@Peter, we will be working on a Mac/Linux port of our work shortly.
@Jay, No more frequently than you normally would with Git. We don’t think of that as part of the solution.
@Gurinder, They do use Team Services – the same one our external customers use too. It supports the GVFS endpoints. I’m going to write some more about the tooling changes and I’ll talk about the merge conflict experience. We’ve written a new one using our extensibility points. We plan to make it available to external customers in the next couple of months.
@Allan, I don’t. We’ve shared our work with GitHub but they have not told us if they have any plans or what they are.

Brian
2 weeks ago

Sean Beseler

all I can say is wow
2 weeks ago

Matěj Cepl

Welcome to the Light Side! Anyway, two comments:

1. How many changes there were between the last commit in the local checkout and the remote with that git pull? A minute is quote a long time (and just barely shorter than git clone, which seems suspicious). If it was a realistic morning fetch of all commits from the previous night, then it is strange.

2. I know nobody wants to change their workflow because of tools, but my experience is that git really works better with small commits. Is it correct that the long commit and push times are caused by large commits? And on the top of that commit is not the last moment one touches git repo. With switch to git one gets ability to use tools like git bisect and those are quite useless with too large commits. I really liked this blogpost from of mine colleagues who used to work on the project management of Xorg (quite large project in the free software world) https://who-t.blogspot.cz/2009/12/on-commit-messages.html (also other blogposts in the workflow tag).
2 weeks ago

Michael Lerro

Very interesting, thanks.

Love that you guys are releasing your work as open source and sharing with the greater community
2 weeks ago

Lewis Cowles

It’s awesome Microsoft is using and contributing to open-source solutions. The satisfaction-survey part made me chuckle, but I’d imagine once the roll-out is complete stretch goals and future iterations will be able to address tooling that makes git largely invisible.
2 weeks ago

Troy SK

Access or it didn’t happen! 😉
2 weeks ago

Brian Harry MS

@Troy, I’m afraid I don’t understand your issue. Could to share more?

Brian
2 weeks ago

Scott Brickey

after reading some of the considerations/goals/etc of GVFS, I’d be curious whether prefetching could be given recommendations from the TFS server, based on ML identified trends within the repo… perhaps some files are being used more than others (when new features are being added, such as vista’s UAC, which probably involve a bunch of debug step-into), or by certain users (ideally groups, but unsure how you’d identify that within TFS) focusing on specific areas.

Seems like the ML would be one of those larger investments (like SCVMM), but if it’s effective, would really be useful for the uber-large repos (on-prem if necessary), or potentially just really nice performance benefit for VSO subscribers.
2 weeks ago

Brian Harry MS

@Scott, yes, I suspect we could do some predictive modeling and optimize the cached state. We don’t yet. At some point, we might look into that.

Brian
2 weeks ago

Gustavo

Is it better to have a single repo? Devs are not so good at respecting boundaries if everything is logically accesible and the end result is projects depending on hundreds of projects instead of librarries which is more efficient. I have experienced this problem with a team of around 150 Devs, I guess it’s worse with thousands of Devs contributing.

I could see benefits or having a Nano repo, and then layers of repos to get to a Full version.
2 weeks ago

deepender07@gmail.com

This is great step for microsoft..!
GVFS is indeed a requirement for large codebases.
2 weeks ago

Victor C Nwafor

wow….. can’t wait to be a part of this awesome creatives…
2 weeks ago

Brian Harry MS

@Gustavo, Let me shed a little more light on this. Some explanation gets cut in an attempt to keep the “story” short and digestable. It turns out the Windows and Devices Group (WDG) that is responsible for Windows, Xbox, Phone, HoloLens and all the extended platform pieces around them actually has quite a large number of Git repos. I don’t have an exact count but it’s in the high hundreds to low thousands of repos. The focus of GVFS has been on the “OS” repo which is the core operating system. We looked very hard at decomposing it and we found that our workflow just was not amenable to that. You might checkout the discussion on Hacker News and elsewhere and find that other large engineering companies like Google and Facebook reached similar conclusion about their core platforms and have adopted solutions with the same general aim as ours.

Brian
2 weeks ago

Marc-Andre Poitras

Are there any plans to host Team Services in other Azure data centers? We are limited to using the on-premise version because of data residency issues. We have 2 data centers in Canada which would be great if we could have Team Services run there.

Thanks for all the awesome work. And keep the Farm stories coming!
2 weeks ago

ex v-miczer

Is Bing division also switching from Source Depot to Git?
2 weeks ago

mpwags

WOW, nice work folks!
2 weeks ago

Christopher Nelson

Google has the largest single repository. Check your facts:

Google’s monolithic software repository, which is used by 95% of its software developers worldwide, meets the definition of an ultra-large-scale system, providing evidence the single-source repository model can be scaled successfully.

The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google’s entire 18-year existence. The repository contains 86TB of data, including approximately two billion lines of code in nine million unique source files.

https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
2 weeks ago

Greg

With that quantity of developers pushing to the same repository, how is anyone ever “current enough” to successfully push. I have seen even with 10 developers, this happen (using simplified branch terminology):

1. Finish up a task branch (all current with Master at that moment)
2. Merge task branch into Master
3. Push Master
4. Whoops – you are no longer current. Someone pushed before you.
5. Pull, merge, get current again.
6. Push Master
7. Whoops – you are no longer current. Someone pushed before you.
8. ……

I can only imagine with 4K devs how challenging that could be.

Greg
2 weeks ago

Philip Patrick

First of all – nice work! But of course there are questions 🙂 What is particularly interesting is how did you migrated the history. We are currently on TFVC and would like to switch to Git in TFS, but would like to preserve all history. There is git-tf tool that we will surely try out, but maybe you have better idea?
Another question – to my understanding that means your are not really working in distributed VC, since everything is virtualized and will have server connection to bring the code down to dev’s machine, or do I miss something?
2 weeks ago

Brian Harry MS

@Christopher, I said the largest Git repo. You are correct that Google has long had a very large mono-repo. We’ve talked to them about it a fair bit over the years. But their mono-repo is not in Git. Hence I stand by my claim (at least on that account). They do use Git for Android (and I’m sure elsewhere) but none of those repos approach the size.

Brian
2 weeks ago

Saeed Noursalehi

@Matěj, good questions.

1. That pull number shows the 80th percentile time from our telemetry, so it’s not measuring one specific pull. That said, the majority of the pull time is dominated by two things:
– We have added a pre-command step on fetch and pull to download all commits and trees from the server, for all commits since the last fetch/pull. This allows us to have those lightweight objects present locally, but not download the blobs until they are needed. This can take some time, depending on how many new commits there are, and in the Windows repo, the answer is always: a lot of new commits.
– The merge that happens during pull is one of those commands that gets slower and slower the more people’s repos get hydrated (because those telemetry numbers don’t reflect our O(modified) changes, which were just released this week).

You compared the time to clone, but clone is simpler in some ways because it only has to download a packfile of commits and trees (again, no blobs) but doesn’t have to do a merge since we start out with a pure virtual projection of the repo.

2. Some of the commits can be large (e.g. merges between two teams’ branches), but the vast majority of commits are small. The long git commands aren’t really caused by “big commits”. Most of the cost comes from the fact that, with 3M+ files in the repo, the git index is huge and takes time to process. Plus, with the previously deployed GVFS, the more the repo was hydrated, the more those commands also had to do IO in the working directory. We expect many of those commands to be much faster now with O(modified), but telemetry will tell the real story.
2 weeks ago

Roman Frołow

@Greg

Use my script

https://github.com/rofrol/git-helpers/blob/master/git-prmp

1. update of master without checkout to master
git fetch -f origin ${branch_to_merge_on}:${branch_to_merge_on}
2. then rebase
git rebase ${branch_to_merge_on}
3. merge to master without checkout to master
git fetch . HEAD:${branch_to_merge_on}
4. push
git push origin ${branch_to_merge_on}
2 weeks ago

Brian Harry MS

@Philip, Actually, they decided to start “fresh” and did not migrate the history. They moved all the “tip” source code over and will leave Source Depot around indefinitely for people who need to go back and look at older revisions.

Brian
2 weeks ago

Edilson Azevedo

Cloud Team – BrazilHuE
2 weeks ago

MoiMoi666

Hi,

Many thanks for the blog post. I appreciate these insights so much!

There is still something not really clear to me: using such a large Git mono-repo, how does it affect the build time? I would expect very long compile time because of the immense source code. Which strategy is applied in these cases?
2 weeks ago

Brian Harry MS

@MoiMoi666, Yes, the Windows build is VERY long if you build the whole thing. The team goes to great lengths to ensure that most people don’t have to build the whole thing. We also have sophisticated build technology to parallelize and cache builds. Most developers don’t need to build too much of it too frequently so it’s not crippling but it does create friction. Once we wrap up our Git work, I’ll start to blog about some of the work we are doing in the build space.

Brian
2 weeks ago

Brian Harry MS

@ex v-miczer, Yes, most of Bing has already moved to Git. They did so without GVFS though they have 1 or 2 repos big enough that they’d like to use GVFS. We’ve just been asking them to wait until we get the Windows team successful. A couple of weeks ago, I asked that we go start engaging with the Bing Team on GVFS because I think we are ready.

Brian
2 weeks ago

Brian Harry MS

@Marc-Andre Poitras, Yes. We will be expanding into more data centers in the second half of this year and Canada is on our list of top candidates.

Brian
2 weeks ago

Saeed Noursalehi

@Greg, you’re absolutely right that with 4000 devs, there’s no way each dev could pull, merge, and push fast enough to ever get their changes in. We even struggled with that on a team of 300. The way we’ve handled that is that no one pushes directly to a shared branch like master. Not only do you have this race to contend with, but it’s also difficult to maintain quality if people can just push code to master at any time. So we require pull requests into all important branches, both to enforce policies, and also to allow the server to handle that pull/merge/push race for you. I’ll talk about how we did this in more detail in a follow up article to this one: https://www.visualstudio.com/learn/technical-scale-challenges/
2 weeks ago

Wilson

Hi,

Is the driver downloadable via nuget ( https://www.nuget.org/packages/Microsoft.GVFS.GvFlt/ ) ?

thanks.
2 weeks ago

RicarDog

Hello Brian, you said in other posts that MS intends to keep both DVCS (Git) and centralized VCS (TFVC) options, however the roadmap for TFVC is not so clear, and some VS documentation already refer to it as “legacy” (https://www.visualstudio.com/learn/migrate-from-tfvc-to-git/).

I work on a game dev company and I believe most professionals agree that a centralized VCS is the best option for game development, due to the high volume of non-mergeable binary assets that benefit from locking control, ease of use by non-technical guys such as artists, etc. That’s why Perforce is still very popular in the game industry.

So could you please elaborate on your plans for TFVC? Is it still being supported and developed? Are there plans for improvements / new features to TFVC and its tools or is it really a legacy system that will be kept as-is?

Please, do not turn it into a second-class citizen in the VCS world. I know the industry is walking towards DVCS but there are still many non-legacy use cases that make TFVC a better option. Thanks!
2 weeks ago

Wilson

In case anyone wants to download the driver, it is really at nuget: https://www.nuget.org/packages/Microsoft.GVFS.GvFlt/

cheers
1 week ago

Brian Harry MS

@RicarDog, You are right that they are scenarios where a CVS is better than a DVCS and that is part of why we support both. We do continue to invest in both (though, certainly in the last year or so, the investment in Git has been much higher). We recently, though, for instance, added support for TFVC for Azure CI/CD. The VSCode team is working on adding TFVC support right now too. We’re working on an updated Windows Shell extension for TFVC. And more. I’ll talk to the team about the use of the word “legacy” in the docs. I suspect the word was used lightly. The context is people moving to Git from any other version control system.

Brian
1 week ago

Jason

Very cool!
1 week ago

Robert Pancoast

As Engineers we require exact precision for collaboration.

GIT and Visual Studio Team Services also work well for Electrical schematics, not just plain-text!
The Microsoft backed security is priceless.

Very well done Microsoft.
1 week ago

Daniel Vicarel

GVFS sounds awesome, but so does the Git LFS project being worked on by GitHub/Atlassian (https://git-lfs.github.com/). I’m stunned that you’ve only given LFS a single passing mention in the two articles that you’ve written about scaling Git so far. LFS has been publicly lauded since 2015, so I don’t know how you can act like Microsoft is the first or only organization to tackle the Git large-file/large-repo problem. You may not have said this explicitly, but totally ignoring the obvious competition is just about as bad. I personally would need an honest comparison of GVFS to LFS, and/or to other large-scale VCSs like Perforce, before I could take GVFS any more seriously. That said, these past two articles have been a very clear and informative, so I look forward to any further information from you on this topic.
1 week ago

Brian Harry MS

@Daniel, We are very familiar with LFS. We have worked with GitHub on it and support it both in Team Services and in TFS. We recently added support for the new locking mechanism introduced in LFS.

You’re right that I didn’t talk a lot about it and maybe I should have. I’ll see if Saeed can cover it in more depth in his technical articles.

At I high level GVFS and LFS have some overlap but not a ton. LFS – Large File Storage is specifically designed to address problem with Git storing large file content (both current and historical) that can cause repos to bloat and performance to degrade. GVFS is a much more general solution that addresses large numbers of branches, large numbers of files and extended history in addition to large files. I’m not saying it is the answer to everything or that it is for everyone. It’s a solution to a set of problems for which there are currently no other solutions (for Git) that I am aware of.

Brian
6 days ago

Geoff S

Just wondering what GUI client (if any) is in use by most of the MS developers ?
6 days ago

Brian Harry MS

@Geoff S, I haven’t looked to see what the ratios are. In general, engineers at Microsoft are pretty heavily command line based for source control. A lot of people use VS though. All of the major Git GUIs have some adoption – SourceTree, GitKracken, GitHub Desktop, Tower, …

Brian
6 days ago

André Villar

I’d be super interested on an article describing the decision to use git. What other VCS where considered, what improvements would those systems need as git needed GVFS, etc.
6 days ago

cj

I’m curious. When commits are merged into their target branch, are the commits rebased onto the target branch? Or are they just plain old merged in, with a new merge commit created? At my last place of employment, they used the rebasing strategy, and that cleaned up the branch lines considerably. Otherwise if you use a tool to show commit logs, you can get thousands of vertical line showing branching activity (Which can be a real pain).
6 days ago

Brian Harry MS

@Cj. Git merging strategy is the source of endless debate. There are pros and cons to various approaches but people tend to have a very strong preference for one approach or another. We support them all and even have branch policies that allow you to enforce what the merge strategy is in a given branch. Inside Microsoft, as with the broader world, different teams have chosen different strategies. I haven’t done a survey to assess which strategy is used the most.

Brian
5 days ago

Brian Harry MS

@Andre, I talked about it some in this post: https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/

Brian
2 days ago

Brian_Patino

Fantastic read! I’m super excited about the future of windows. Thanks for the great article 🙂