Sunday 18 June 2023

Having Git permissions per subfolder

TLDR: I experimented with a way to allow Github to have permissions (including read permission) per subfolder here.

There is a borderline cult discussion in recent years about Monorepos vs Polyrepos. 
I worked with both in various circumstances and obviously, they both have their advantages and disadvantages, yet many regard monorepos as "deprecated" "old" and so on, which I find bizarre.

Some of the advantages I see using Monorepos:

  •  The ability to make a wide change, let's say across "backend" and "frontend" with a single commit, PR or branch
  • The need to pull/commit/merge/push so many repos can become tedious and demoralizing, although tools like Mani/Gita can help with that.
  •  Many organizations can get to the step where a change/feature can involve many repos, every repo usually needs to create a PR, they usually all depend on each other, and many times merging one without the other can cause issues. Its becoming hell to manage the PRs state, what merged, what its not, and what the hell is the change that we are trying to merge (as its spread on so many repos)
  • CI/CD - its much more simple IMO in monorepos. Let's take an example I want to change app + frontend + backend, and I want to test how they behave together. With polyrepos the CI has to know on which branch I work on each repo, and somehow know that all those repos/PRs are linked and should be tested together. In monorepos, you create one PR, and all the tests including the integrations tests out the box can test all the components with their changes, without having to have custom processes. 
  • Speed - tools like yarn workspaces can create caching between projects, and reuse dependencies between projects, speeding up build/install process drastically. I guess this can be achieved using submodules or tools like Mani/Gita which simulates Monorepos, but I am not sure.
Yet the main disadvantage I see with Monorepos is the permissions. Usually exposing the codebase to your employees is not a bad idea, it can get them to have to full picture and be more productive in many cases, but there are cases, usually external contractors and sensitive IP that is an issue with Monorepos as git/Github/Gitlab does not offer per subfolder permissions. For example you want a external contractor to work on the Frontend without access to the sensitive IP in the backend. There tools like Copybara/GitPermit or even submodules that creates another repo and sync it with the main. There are many issues with it, the main one I see is CI/CD + PRs. Now there are two places that CI/CD and merges have to be occur, which complicates things, in some cases drastically.
Keeping repos in sync is also difficult, especially in submodules when updates have to be usually done manually, instead of going through the regular PR/Review system.

Having the ability to restrict the original Git repo to certain subfolders can have many advantages. It seems no one really tried to even start doing it, many claim it's not possible. The problem is really split into two:
  • Github web interface
  • Github/Git CLI 
Github web interface could be solved using some kind of reverse HTTP proxy, to filter out only APIs/responses that we want to allow the external user. I used Apache2 + MITM python API.

Github/Git CLI is a more complicated to solve. I researched Git Partial clones + Git sparse-checkout and concepts like Git hooks in order to solve this one. Using the above concepts in git, by hooking git pack-objects on the Git server I can filter objects that are in sensitive subfolders. 

By using Github API + Fetch I can also leverage the Github "external"\"guest" credentials, so they can keep using their credentials instead of using some unique one for the project. 

The POC is here as said in the TLDR:

No comments:

Post a Comment