Page MenuHomePhorge

arc browse: add support to Subversion repos
Needs ReviewPublic

Authored by valerio.bozzolan on Sep 10 2024, 11:26.

Details

Summary

The command arc browse FILE now support Subversion repositories.

It's not very clear if this ever worked for a period before 2020:

https://we.phorge.it/source/arcanist/browse/master/src/ref/ArcanistRepositoryRef.php;80f5166b701d788fb598c98d60fd707ea85fff6b$126-127

Anyway, now it works.

Closes T15541

Test Plan

In a random Phorge, create a new Subversion repository in Diffusion
with some contents, like these:

README.md
trunk/lol.txt
/tags/17_04_2019/lol.txt

Then clone that repository with your usual URL, that is:

svn checkout svn+ssh://phab@phorge.localhost/source/example-svn-repository example-svn-repository

All of these work:


In the very same repository, manually go to the Diffusion > Subversion menu and set "trunk/"
as sub-directory - so the web interface only shows the trunk directory.

All of these still work:

  • arc browse trunk/lol.txt
  • cd trunk; arc browse lol.txt; cd -

Inside the directory of arcanist, all of these work as expected, opening the browser
web on the specified file and on the specified branch:

  • arc browse README.md
  • arc browse bin/arc
  • arc browse README.md --branch stable
  • arc browse README.md --branch master
  • arc browse bin/arc --branch stable
  • arc browse bin/arc --branch master
  • cd bin; arc browse arc; cd -
  • arc browse bin/arc

The browser web carefully opens as expected.
If it worked before, it works now, but also on Subversion repos.

Diff Detail

Repository
rARC Arcanist
Branch
arcanist-browse-works-with-subversion
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 1589
Build 1589: arc lint + arc unit

Event Timeline

Note that I introduced the method isBranchJustFilesystemPath() that returns something different for Subversion, just to recognize Subversion,

So in the future we can replace some occurrences in this way that is less hacky and more object-oriented:

- if ($api instanceof ArcanistSubversionAPI) {
+ if ($api->isBranchJustFilesystemPath()) {

I have not a strong opinion against the name of that method. 🤔 I've invested all my creativity there, so, suggestions are very welcome.

a. I'm not sure I like isBranchJustFilesystemPath() to check Is this SVN - if we're trying to be generic, then there may be a different vcs that does this particular trick, but has different logic; And if we're not trying to be generic, then isSVN() is probably a good-ish name? or isVCS('SVN').

b. If the user explicitly specifies a branch, shouldn't that take precedent over whatever is calculated from the $api?

c. don't we have a matching logic for "get branch" for other VCSes?

src/browse/query/ArcanistBrowsePathURIHardpointQuery.php
61

if you default to $ref->getBranch() here, you won't need the else.

66

Did you actually test it on a branch that contains / ?

src/ref/ArcanistRepositoryRef.php
91–94

won't this convert both '' and '/' to empty string? vs the comment saying we want '/'?

valerio.bozzolan marked 2 inline comments as done.

integrate tips

wow, it also makes sense to add --branch support to SVN! added

a. I'm not sure I like isBranchJustFilesystemPath() to check Is this SVN - if we're trying to be generic, then there may be a different vcs that does this particular trick, but has different logic; And if we're not trying to be generic, then isSVN() is probably a good-ish name? or isVCS('SVN').

Some thoughts:

  • it we try to add a method ArcanistSubversionAPI#isSVN() that returns true, we should also add ArcanistRepositoryAPI#isSVN() that returns false; and it may create some desire to have also isGit() etc. but that may become a bit weird to implement them all. Also, if we decide to do not add them all (skipping isGit()) it still will be a bit not-intuitive. At this point, it's probably better to just run if ($api instanceof ArcanistSubversionAPI).
  • we may try the "smarter way" to add ArcanistSubversionAPI#isSVN() implemented like return $this instanceof ArcanistSubversionAPI but that would introduce a circular dependency, so still probably not a desiderata
  • it we try to add a method ArcanistRepositoryAPI#isVCS('SVN') we should also probably add a data-member "SVN" / "GIT" / "MERCURIAL" in all classes, but that it's probably overkill, and maybe if ($api instanceof ArcanistSubversionAPI) would be better. Moreover, that will need some extra backward-compatibility extras to support eventual unknown downstream additional esoteric APIs. Not really something critical, but something still somehow overkill, compared with if ($api instanceof ArcanistSubversionAPI).
  • by the way, the semantic of isBranchJustFilesystemPath() somehow makes sense! (surprised face) I'm really surprised that it's also somehow readable, if we look at ArcanistRepositoryRef and its logics to escape a filesystem path (not needed if the branch is already a filesystem path). So we tried to describe the specific nature of the branch. In any case, even if the name is a bit "unusual", avoiding instanceof is probably still something interesting, for the fun of object-oriented fans.

b. If the user explicitly specifies a branch, shouldn't that take precedent over whatever is calculated from the $api?

  • premising that in git this hasn't changed, so, before yes; after yes
  • for SVN, premising this never worked: hey, you are 100% right, it may make sense to implement too! I've added support right now and added in test plan (arc browse --branch /tags/17_04_2019/ lol.txt - wow) 👍

c. don't we have a matching logic for "get branch" for other VCSes?

  • OK I don't get this question. I think the answer is that the default behavior of arc browse was to operate on ArcanistRepositoryRef#getDefaultBranch()
  • but allowing to to specify a arc browse --branch FOO and that is $ref->getBranch()

Maybe I should just try to improve the ArcanistRepositoryAPI#getDefaultBranch() 🤔 since the root problem is there, and the current solution just involves more code to avoid to fallback on that "master" default

mm, isBranchJustFilesystemPath() makes more sense now, as in "the branch name is part of the URI path"...

Also, I think I've literally never used SVN with Arc/Phabricator.

if "branch" is just part of the path, wouldn't the full file-path be the same in the local disk and in the server repository?

mm, isBranchJustFilesystemPath() makes more sense now, as in "the branch name is part of the URI path"...

Nice 👌

Also, I think I've literally never used SVN with Arc/Phabricator.

That's an interesting wild experience. In the company I work for, they have kind of 100/200 SVN repositories, and Phorge saves their life.

if "branch" is just part of the path, wouldn't the full file-path be the same in the local disk and in the server repository?

I've wrote this comment for 20 minutes to answer correctly. The short answer is yes, you are probably very right.

So probably, even if arc browse is useful, arc browse --branch in Subversion still has probably completely no sense in 99.99999% of cases...

Let's pretend to have this remote repository structure:

/README.md
/trunk/lol.txt
/tags/17_04_2019/lol.txt
/tags/17_04_2020/lol.txt
/tags/17_04_2021/lol.txt

The above directory structure is surprisingly common. Note that the "trunk" directory is just a directory (that is the "master") and the "tags" directory is just a directory (with copy-pasted old versions of trunk in it - let's say - lol).

Anything really is just a directory: yes, worldwide Subversion users just somehow communicated to each-other that the "trunk" directory contains the last version; and the "tags" directory contain old versions. That's it. Don't ask me why. LOL

Subversion users can download everything with:

svn checkout svn+ssh://phab@phorge.example.com/source/myrepo myRepoComplete

And they obtain on your computer the very exact remote complete structure. At this point you can run arc browse correctly. Some working examples:

$ arc browse README.md
http://phorge.localhost/source/myrepo/browse/README.md

$ arc browse trunk/lol.txt
http://phorge.localhost/source/myrepo/browse/trunk/lol.txt

$ arc browse tags/17_04_2021/lol.txt
http://phorge.localhost/source/myrepo/browse/tags/17_04_2021/lol.txt

And it works.

At this point you may also try some "intuitive" things, like, opening lol.txt on a different "branch" but... it's interestingly nonsense.

$ arc browse --branch tags/17_04_2021   trunk/lol.txt
→ 404 page: http://phorge.localhost/source/myrepo/browse/tags/17_04_2021/trunk/lol.txt
→ expected: http://phorge.localhost/source/myrepo/browse/tags/17_04_2021/lol.txt
→   but we really cannot imagine that... how we should strip "trunk/"?

$ # Maybe this?
$ arc browse --branch tags/17_04_2021   lol.txt
→ Unable to resolve argument "lol.txt"

Or, what about opening tags/17_04_2021/lol.txt but on the trunk?

$ arc browse --branch trunk   tags/17_04_2021/lol.txt
→ 404 page: http://phorge.localhost/source/myrepo/browse/trunk/tags/17_04_2021/lol.txt
→ expected: http://phorge.localhost/source/myrepo/browse/trunk/lol.txt
→   but we really cannot imagine that... how we should strip "tags/17_04_2021/"?

In short, this seems just nonsense.

Also... Subversion users can also download just part of the remote tree. That's interesting.

For example, you can download only the trunk and only work on that:

svn checkout svn+ssh://phab@phorge.example.com/source/myrepo/trunk myRepoOnlyTrunk

The above command in fact only downloads the trunk/ and it only generates this filesystem structure under myRepoOnlyTrunk:

/.svn
/lol.txt

So, even in the above situation, now arc browse lol.txt surprisingly works and gets the expected remote URL on the trunk.

But what about our esoteric intention to open the file lol.txt on the branch /tags/17_04_2019?

$ arc browse --branch tags/17_04_2019 lol.txt
→ 404 page: https://sviluppo.erinformatica.it/source/fe_data_imprt/browse/tags/17_04_2019/trunk/lol.txt

In Short

Premising that arc browse would be still super-useful also in Subversion,

We we should probably block the feature arc browse --branch for Subversion users, since this only leads to a path full of tears and madness.

Ok, so --branch probably doesn't make much sense in that case, right.

What if the user used the svn checkout svn+ssh://phab@phorge.example.com/source/myrepo/trunk myRepoOnlyTrunk form - are we still able to find the right target when arc browse lol.txt? Is it a common use-case?

I will add a warning about --branch and better management

What if the user used the svn checkout svn+ssh://phab@phorge.example.com/source/myrepo/trunk myRepoOnlyTrunk form - are we still able to find the right target when arc browse lol.txt? Is it a common use-case?

Yes and yes!

The potential complication is that if the .arcconfig sits at the root of the repo only, then checking out a slice below that in the directory tree won't have the .arcconfig. But it's kindof implied by the note at https://we.phorge.it/book/phorge/article/arcanist_new_project/#arcconfig-basics that one should commit .arcconfig files alongside your other code and follow normal branching workflow (aka whatever perverse bespoke branching workflow this particular group of SVN users has chosen) with it, and you just generally won't have a proper config otherwise, so that's not at all specific to arc browse. And I've tested and the fallback plan of specifying the values on the commandline, aka arc --config repository.callsign="MYREPO" --conduit-uri http://phabricator.gmcl.internal browse lol.txt (though I actually just used a checkout of a real repo at my work), does work fine.


Our somewhat pathological use of SVN at my work shows how complicated getting --browse to work properly for SVN could be, because our structure is essentially:

  • Our Big Repo Containing Most Projects
    • Project1
      • trunk
      • branches
        • release11_4
        • release11_5
    • Project2
      • trunk
      • branches
        • release11_4
        • release11_5
    • working
      • trunk
        • external pointing to Project1/trunk
        • external pointing to Project2/trunk
      • release11_4
        • external pointing to Project1/branches/release11_4
        • etc etc etc, you get the idea.

(If unfamiliar with SVN you can think of externals roughly the same as Git submodules, or more accurately like symlinks.)

So what does a branch even, like, mean, man? It's clear enough to a human considering context and nomenclature, but from arc's perspective it's unclear as hell. The plan to warn SVN users about --branch is probably the best possible approach, barring rather complex logic that humans would still find some way to break the assumptions of ;)

(One approach we used to try and square Phabricator's concepts of what a repo is with our own multi-project-within-one-repo usage was to create additional repositories in Phabricator that each used the special Import Only option to specify the project in question's path, and from there then one would see the trunk, branches, and tags folders for each project in its standalone "repository" in Diffusion and etc. The benefits of this though ended up being meager, and the proliferation of "repositories" that are all covered under the real repository was a bit ugly and junked up the listings as we started using some Git repos in addition to the two SVN repos our main development efforts live in, so we abandoned that.)

(The Arguments have a supports feature for these cases - like https://we.phorge.it/source/arcanist/browse/master/src/workflow/ArcanistLintWorkflow.php$67 . Not sure what it actually does, but 🤷🏻‍♂️)