Page MenuHomePhorge

Landbot discussion(s), and generally ensuring that what is landed is what was reviewed
Open, NormalPublic

Description

This is a pot-pourri of copy-pastes from chats and links to the phab discourse regarding the subject of:

How do I ensure that what someone lands is what was reviewed in the corresponding diff while also letting them fix typos or implement minor comments?

Here is an older discussion with some background, as well as some input from Evan.

Options some orgs have considered (non-exhaustive)

Landbot

arc land does not push anything: it triggers a bot somewhere to run arc patch and as many other checks one would like before handling the push.

This has some similarities with gitlab's release trains.

Pre-receive commit hook

Here, a pre-receive commit hook does the following:

  • finds out the diff belonging to the commit,
  • arc patches the diff to a separate, clean working copy,
  • compares the diff of the patch with the diff of the commit,
  • rejects the push if they differ.

This forces the dev to update his diff when what he would push differs from it. Then we have some rules around what areas of the repo have a sticky accept or not.

It does not solve for "diff breaks master after rebase" but at least you work around the problem of malicious or unintentionally borked pushes.

Here is a gist for a partial implementation (it still attempts to compare patch files vs applying the changesets and comparing the output of git diff.

Diff Similarity settings

Landbod and pre-receive hook give us a place where to put some logic, but this still leaves the question as to how one decides where the limit is between a change that warrants a new review and one that does not.

The way we currently handle this is that:

  • some owner packages have stickiness disabled
  • if a diff suddenly involves a new package owner, it will need to be reviewed

And we manage this via a custom webhook that phabricator calls upon Diff update.

Onwards

It would probably be worth thinking a bit about what uses cases we want to cover, but at least we can quickly have some hooks that are easy to install for users that know what they are doing.

Event Timeline

jupe triaged this task as Normal priority.Jun 26 2021, 17:17

The plan upstream was to (eventually) have arc land trigger T182, and do the whole thing server-side.

Some limitations on this right now is that Staging Area only works for git, and that the Revision page doesn't use data from the Staging area (but the T182 Landing code does). This could be "fixed" by having the land-bot read changes from the Revision (and maybe dropping the Staging Area).

From product POV, users were traditionally allowed to land something that differs from the actual revision, to allow the use case of "accept diff, but please make this small adjustment before landing". I think it's a good use-case, but I think it's also fine if the user runs arc diff again and then arc land, if it simplifies other workflows.

There's also the point of users being used to arc land pushing code from their machine, so switching its behavior to delivering different code could have adverse UX.


For the "diff breaks master after rebase" situation - the only solution for this is to have the CI run a rebase before running the tests, or have the CI part of the Landing flow, allowing it to block the push. This does slow the Landing situation, and makes it basically impossible to parallelize, but it's probably worth it if commits to a specific repository aren't very frequent and the CI is fast.

diff breaks master after rebase

Is there anyway at all to determine which commit a revision diff was compared to, and "if this is not HEAD" don't allow the land? (forcing users to re-base and resubmit their diff?)

I am not sure how bad this would gum up everything and/or in high-volume environments...nothing ever gets landed...

There's also the point of users being used to arc land pushing code from their machine, so switching its behavior to delivering different code could have adverse UX.

Yes - this would be a bit strange - After landing, your code would not be in your local master... So your workflow would have to account for this - You'd need to keep your revision branch around for a bit if you were writing code against it...

In T15024#716, @avivey wrote:

The plan upstream was to (eventually) have arc land trigger T182, and do the whole thing server-side.

We do this at $employer. There is even a button in the web UI to land. We use the staging area revision + drydock + harbormaster to good effect.

For the "diff breaks master after rebase" situation - the only solution for this is to have the CI run a rebase before running the tests, or have the CI part of the Landing flow, allowing it to block the push. This does slow the Landing situation, and makes it basically impossible to parallelize, but it's probably worth it if commits to a specific repository aren't very frequent and the CI is fast.

We do this too. The land workflow runs all tests and only if they all pass do we do the final push. for "master is broken" scenario there is limited possibility to trusted members to bypass the submit queue (test) and even push queue.

In T15024#716, @avivey wrote:

For the "diff breaks master after rebase" situation - the only solution for this is to have the CI run a rebase before running the tests, or have the CI part of the Landing flow, allowing it to block the push. This does slow the Landing situation, and makes it basically impossible to parallelize, but it's probably worth it if commits to a specific repository aren't very frequent and the CI is fast.

You can have the bot grab patch by batches even for very repo where you commit very frequently. In practice, this point is much further back than most people expect.

diff breaks master after rebase

Is there anyway at all to determine which commit a revision diff was compared to, and "if this is not HEAD" don't allow the land? (forcing users to re-base and resubmit their diff?)

I am not sure how bad this would gum up everything and/or in high-volume environments...nothing ever gets landed...

That being said, to answer your question, yes, the base on top of which a diff is submitted is known to phabricator. Obviously, in the case of stacked diffs, that may turn out to be completely useless.

That being said, forcing people to do stuff is generally exactly the opposite of what you'd want to do. We have computer to do work for us, not for us to do work for them. 99% of the rebase can be done automatically and don't break anything.

Our job here is to remove roadblock which are in people's way when they build software, not to add them.

In T15024#720, @eax wrote:
In T15024#716, @avivey wrote:

The plan upstream was to (eventually) have arc land trigger T182, and do the whole thing server-side.

We do this at $employer. There is even a button in the web UI to land. We use the staging area revision + drydock + harbormaster to good effect.

This work great and I've use similar setup in various orgs. However, this does not work for open source projects, because you generally can't trust that what's in the staging area matches what in the diff UI on phabricator. In this case, the bot needs to arc patch the changes from phabricator rather than pulling from the staging repo.

This is an area I'm interested in with regards to Harbormaster's future, though I don't have any clear designs on anything. The concept of a merge queue is interesting and something we've started looking into at my company. Here are some resources we've looked at:

So, just to clarify, as I realize that my original problem statement is a bit too restrictive: I meant How do I ensure that what someone lands is what was reviewed in the corresponding diff in a rather flexible way:

We very much still want to allow for someone to fix a typo or implement some minor comments without needing another review.

However, some additional requirements to make everyone on the compliance side happy were:

  • some areas of the repo are really critical (eg, infrastructure as code and such things), and for these we want any change to be reviewed.
  • if an update after review changes the scope of the diff significantly (eg, it suddenly touches files belonging to another team), we want to have another review

We managed to set this up by:

  • making sure a landed diff has been updated
  • we have a webhook triggered on diff update that checks the conditions above, by essentially checking:
    • has a new owner been added as a reviewer by this update?
    • is any "non-sticky" owner involved in this diff?
  • and if so, it requests a new review(*)

(*) Evan was suggesting the inverse approach: make accepts non-sticky by default and make a list of exceptions where the web-hook re-accepts the diff. Ultimately I'd like to go for that option, but the sticky-accept setting is an instance-wide one and it would be disruptive to existing repos.

In T15024#725, @speck wrote:

This is an area I'm interested in with regards to Harbormaster's future, though I don't have any clear designs on anything. The concept of a merge queue is interesting and something we've started looking into at my company. Here are some resources we've looked at:

Excellent resources! A couple of them i have not read before.

One small thing that hasn't been made explicit. it's not ground breaking, but it is always better to be clear about requirements: you need an escape hatch. Sometime, something is broken, and the bot will stop merging anything. In this case, people will need to be able to push themselves, at least the trusted people. In the case of a company, this can be all employee, really, as if you don't trust your employee, you probably are better off not hiring them in the first place, but in the case of an open source project, this fall back on a set of trusted committers.

This can be achieved easily in practice by creating an extension for arc that hijack the behavior of arc land to go hit the right endpoint to trigger the bot (and you can also have a button in the UI to do directly from there), but also have a flag that gets back to the old workflow. Making that flag scary looking and anoying to type is usually enough for people to just go along with the bot workflow, for as long as it helps them and doesn't get in the way. Something like arc land --bypass-all-sanity-checks-because-i-am-a-madman works wonders. Human are a funny bunch.

In T15024#726, @jupe wrote:

So, just to clarify, as I realize that my original problem statement is a bit too restrictive: I meant How do I ensure that what someone lands is what was reviewed in the corresponding diff in a rather flexible way:

We very much still want to allow for someone to fix a typo or implement some minor comments without needing another review.

This is indeed something that you want from a trusted set of contributors. That becomes hard to enforce in a sensible way in the context of open source.

This can be achieved easily in practice by creating an extension for arc

We solved that issue by having a special group that, if it has accepted the diff, disables all checks -> no need to entrust every developper with bypassing power, but all they need is convince a colleague from that group to sign off on the change.

Came across someone else's notes about submit/merge queues - https://epage.github.io/dev/submit-queue/