Page MenuHomePhorge

Generate [From, Subject, Date] fields in download raw .diff URL to support "git am" and friends
Open, NormalPublic

Description

Hello folks,

As directed, re-submitting https://we.phorge.it/Q46 here:

FreeBSD project gets contributions from multiple sources. Aside from our Phabricator instance we get a good volume of contributions from github.com.
Many FreeBSD maintainers are using "download raw diff" functionality both on github and on Phab to download the patch and then apply it using git am.
Very often the person committing the change is not the same person that authored the change.
Unfortunately Phabricator/Phorge does not include the ownership/attribution information of the patch. Consider these 2 diffs:

As you can see github generates lines From, Date, Subject, emulating email message. This makes it really handy to feed this into git-am
Without this the information about the Author of the change is lost.

man git-am :

The commit author name is taken from the "From: " line of the message,
     and commit author date is taken from the "Date: " line of the message.
     The "Subject: " line is used as the title of the commit, after
     stripping common prefix "[PATCH <anything>]". The "Subject: " line is
     supposed to concisely describe what the commit is about in one line of
     text.

Attribution information is necessary to generate correct commit messages.
Example change that I authored, but did not commit

Today FreeBSD maintainers need to add attribution manually when downloading diffs from Phabricator.
Is it possible to add this feature to Phorge? It feels like it shouldn't be too hard todo, considering that Phorge knows author's email, name and date of the submission of the Diff


Hours invested7

(Feel free to increase ↑)

Event Timeline

I don't mind if someone points me at the right place in code so I can try my luck with writing PHP.

ton renamed this task from Genrate `From:`, `Subject:` and `Date` fields in download raw .diff URL to Generate [From, Subject, Date] fields in download raw .diff URL.Apr 13 2023, 23:32
In T15249#6218, @ton wrote:

I don't mind if someone points me at the right place in code so I can try my luck with writing PHP.

The flow starts around https://we.phorge.it/source/phorge/browse/master/src/applications/differential/controller/DifferentialRevisionViewController.php$115-121, and is implemented starting line 1047.

This code gets the change content, uses ArcanistBundle::newFromChanges() to create the text content, and saves it to a new File object.
It then responds with a 302 redirect to download the new File content.

Your actual changes might all go in https://we.phorge.it/source/arcanist/browse/master/src/parser/ArcanistBundle.php, which is good news because it's easier to iterate on: the flow arc export actually calls the same code and outputs to a local file.

Ok, after spending a weekend looking at PHP I can say that I overestimated my level of enthusiasm. This is the kind of work that I'd like to get paid for and not spend my free time on.
Sorry @avivey, but you can cross me off the list of potential code contributors.

I admit that I would be TOTALLY impressed if ton would be able to do something like THIS as first contribution. Fortunally this fail just means you are a human being.

So, don't worry. This was just really the core component of Phorge, so indeed it's quite complex and extended and surely could deserve an hack meeting.

Ihih maybe we can have a "number of hours invested in this" as warning. Ton: how much?

It is hard to say objectively, as I am new to PHP and to this codebase in particular. In total I spent probably 4-6 hours over the weekend, but someone with more experience could've done a lot more in that time. I was reading about PHP, setting up my dev environment, reading the code, navigating class hierarchy, trying to understand PHP idioms, etc. In the end of the day I realized that learning yet another language (and TBH not the most sexy one) in my free time to hack on a gigantic legacy codebase is not something that I want to do.

TLDR: hours metric is very subjective

Phabricator's codebase is mostly high quality and fairly easy to jump into but I probably wouldn't recommend it for someone new to PHP :D

I might be willing to take a stab at this but not sure when I'll have the time.

In T15249#6391, @ton wrote:

Ok, after spending a weekend looking at PHP I can say that I overestimated my level of enthusiasm.

Haha, no worries. But maybe try to convince your employer to pay for it 😜

Getting set up with a development environment (including database~) along with configuring the local instance and populating it with sample data is extremely daunting and is frankly why I haven’t done much contributing. I lost my dev environment a few years ago and have tried only once to get set back up, spent a few hours and never came back to it. It doesn’t help that arcanist’s self linter isn’t usable on windows systems.

I suspect the barrier is more the dev environment and less PHP or the codebase.

In T15249#6474, @speck wrote:

Getting set up with a development environment (including database~) along with configuring the local instance and populating it with sample data is extremely daunting and is frankly why I haven’t done much contributing. I lost my dev environment a few years ago and have tried only once to get set back up, spent a few hours and never came back to it. It doesn’t help that arcanist’s self linter isn’t usable on windows systems.

I suspect the barrier is more the dev environment and less PHP or the codebase.

Maybe we could create a testing Phorge instance on:

staging1.we.phorge.it

To be given to developers who prefer a ready-made environment kindly crafted for them. I would sincerely be willing to pull up this resource. Even creating an Ansible script that takes care of this doesn't have to be difficult, so you quickly pull up a new one (if there isn't one Ansible script already).

Just say "Uhm, interesting" and we create a Task for that.

I wanted to set up some docker/compose file or possibly a VM but didn’t get far since neither of those are great solutions on windows, primarily for getting local code into the container/vm. My goal would be to get a quick setup time with a rapid dev/test cycle. For me to make progress I’ll likely have to switch to mac or Linux.

@ton: Why are "they" not using arc patch to download the patch?

@avivey many developers do not use arcanist at all for reasons outlined in https://we.phorge.it/T15096#6224 - many do not wish to install php, some find arc interface confusing...

Some folks have their own scripts that rely on curl/wget to download the diff and feed it into git am

This comment was removed by ton.
In T15249#8232, @ton wrote:

@avivey today I tried arc patch to download a bunch of Diffs.

Some diffs check out OK. Lots of Diffs fail to be fetched, I get a very useless error message:

Exception
strpos(): Passing null to parameter #1 ($haystack) of type string is deprecated
(Run with `--trace` for a full exception trace.)

That just "arcanist doesn't support PHP 8 yet" issue. It doesn't explain why other people weren't using arc patch before PHP 8 became common...

@ton - this is what I have so far:

From: PHID-USER-zkq3ufv3nolm5adpwbjy <tbd@tbd.tbd>
Date: 1690030377
Subject: This is an actual change

diff --git a/foo b/foo
--- a/foo
+++ b/foo
@@ -1,2 +1,3 @@
 this is a file with some text
-noend
\ No newline at end of file
+
+noend

Would that be good (once I fix the date formatting and figure out the name/email)?

I was able to execute this using patch, which I think is the "existing behavior".

(There's a chance that "email address" is not available, I think we discussed this a long time ago. We'll make something up if that's the case).

@avivey Yes I think this is what we need.
There is a caveat though:

  1. Often login email == git commit email, but it is not a given and depends on the user's setup
  2. There are cases when no git commit email is available - when a diff was manually uploaded via web interface. (arc has no such problem, it seems that it takes email from git config).

How are you planning to cover this case? I see several possible approaches:

  1. Do nothing, but that does not solve the issue completely.
  2. When git email is missing - use login email. This is fine for FreeBSD project, since we already use vodoo magic to deduce login email from user's name (usernames on freebsd phab are derived from email, so we just reverse that). This may be seen as a private info leak as not every project shares the same view.
  3. Perform input validation on the UI form for diff submission and reject patches that have no email

FreeBSD project prefers to have identities of contributors and lots of real emails are visible in our git history, so solution 2 and 3 are fine for us, while 1 is not.

My plan is:

  • If there's a commit information in the revision, use the author name and email from there. Not that this might not be the same as the revision's Author
  • If there's no git commit - use author name from Author field, and make up email address (annonymos@example.com).

I don't want to take login email, because it's private (it's not exposed anywhere).

Option 3 is far out of scope for this, and I don't think it makes sense in Phorge in general.

We can probably add an extension point to allow the install to customize this somewhat - then FreeBSD can write an extension to expose the author's login email, because FreeBSD doesn't consider it private.

Ok, I got bored.

With D25358 and D25359, the Raw Diff will always contain Subject (revision title) and date (revision creation date).
If there's a commit, the author name and email will be picked from it; If there's no commit, the From field is omitted.

Uh thanks! Feel free to increase the "Hours invested" counter in the Task description ihih

valerio.bozzolan renamed this task from Generate [From, Subject, Date] fields in download raw .diff URL to Generate [From, Subject, Date] fields in download raw .diff URL to support "git am" and friends.Aug 6 2023, 19:59
valerio.bozzolan triaged this task as Normal priority.