Page MenuHomePhorge

Avoid to generate a permalink for every clicked line number: migrate to web fragments
AcceptedPublic

Authored by valerio.bozzolan on Apr 4 2024, 21:20.

Details

Summary

Historically, when somebody visited a Paste or Diffusion file and clicked on a line number, we always constructed URIs in the format file$45 as the link target for line number 45 in the line number column of document file (resp. file$1-10 for lines 1-10), providing the ability to highlight one or several lines. Accessing this URI highlights the corresponding lines and automatically scrolls to the highlighted line(s) with a 60px top margin.

It is important to note that when $45 is present in a URI, it is sent to the server (making a "permalink"), while #45 would be not. So, a downside of using this permalink-based approach instead of standard # HTML anchor fragments ("web fragments") is that every URI obtained and shared around while clicking from the line number column in Paste and Diffusion file pages, that permalink file$123 is technically a separate URI to be visited and downloaded compared to file. When you click on a line number and you share that permalink on a public comment, that permalink attracts crawlers, and they download that page again, even if crawlers already accessed that content. In an age of pirate AI trainers that scrape any permalink every second, it's better to adopt web fragments instead, so that they download your file once (or at least less often....), whatever the amount of comments you shared talking about lines.

As a solution:

  • When clicking a single line, such as 123, generate a web fragment like #L123 for the link target. This preserves line highlighting but vastly reduces the number of links a crawler could follow when it reaches such URIs, e.g., when a user mentions that line #L123 from a comment.
  • When clicking on multiple lines 123-124, also use web fragments like #L123-124.
  • When receiving legacy visits to $123 or $L123-124, it still works; we have not introduced link rot. However, if you visit these and if you click again on a different line, you get the new fragment-based URI.

Don't panic. Note that the previous situation was not extremely bad anyway, as the Paste or Diffusion did not really allow a permalink to be found for each line anyway. A click was still needed to generate such permalink and a human being was still needed to share such permalink somewhere (e.g. a comment).

Note that this modification cannot replace all comments previously shared by human beings, and they probably shouldn't.

Note that Phorge cannot travel back in time, and we cannot obliviate crawlers to do not try to visit anymore old $123 URIs. At least the promise is that your coworkers will not generate any more of these additional permalinks, and no link is broken.

See T15670
Closes T16100

Test Plan

About the new feature:

About possible regressions:

  • Access URL http://phorge.localhost/P1$9 and see that highlighting and position scrolling by the server-side JS still works as expected
  • Access URL http://phorge.localhost/P1$9-16 and see that highlighting and position scrolling by the server-side JS still works as expected
  • Access URL http://phorge.localhost/P1$16-9, same as above
  • Access the above legacy URIs and click again on another line number, e.g. 2, and you get the new URIs with fragments (no silly things like P1$9#2, but just P1#L2)

For both, perform the same tests in a source code file rendered in Diffusion. Same results.


As additional test, you can define the new JavaScript functions in your browser console. E.g. copy-pasting this:

var _parseLineNumber = function(vRaw) { ....
var parseMinMaxSelectedLineFromFragment = function(input) { ...

Then, test if JavaScript can recognize the fragment interval:

$ parseMinMaxSelectedLineFromFragment("L123")
Array [ 123, 123 ]
$ parseMinMaxSelectedLineFromFragment("L2-50")
Array [ 2, 50 ]
$ parseMinMaxSelectedLineFromFragment("L50-2")
Array [ 2, 50 ]

Also test some nonsense inputs, and they successfully crash:

$ // Test nonsense line.
$ parseMinMaxSelectedLineFromFragment("LMIAO")
Uncaught Input fragment parts must be positive integer. Got: MIAO debugger eval code:11:70
    _parseLineNumber debugger eval code:11
    parseMinMaxSelectedLineFromFragment debugger eval code:42
$ // Test nonsense zero line.
$ parseMinMaxSelectedLineFromFragment("L123-0")
Uncaught Input fragment parts must be positive integer. Got: 0 debugger eval code:11:70
    _parseLineNumber debugger eval code:11
    parseMinMaxSelectedLineFromFragment debugger eval code:43
$ // Test nonsense negative line.
$ parseMinMaxSelectedLineFromFragment("L-123")
Uncaught Input fragment parts must be positive integer. Got: debugger eval code:11:70
    _parseLineNumber debugger eval code:11
    parseMinMaxSelectedLineFromFragment debugger eval code:42

Note: these crashes are not shown to end-users while using Phorge. These exceptions are only for test plan lovers, and for developers.

So we recognize correct web fragments, and discard nonsense ones. Giving backward compatibility.

Diff Detail

Repository
rP Phorge
Branch
arcpatch-D25569_1
Lint
Lint Passed
Unit
Test Failures
Build Status
Buildable 2080
Build 2080: arc lint + arc unit

Unit TestsFailed

TimeTest
62 msPhabricatorCelerityTestCase::testCelerityMaps
Assertion failed, expected 'true' (at PhabricatorCelerityTestCase.php:27): When this test fails, it means the Celerity resource map is out of date. Run `bin/celerity map` to rebuild it. ACTUAL VALUE
0 msPHUIListViewTestCase::testAppend
1 assertion passed.
0 msPHUIListViewTestCase::testAppendAfter
2 assertions passed.
1 msPHUIListViewTestCase::testAppendBefore
2 assertions passed.
0 msPHUIListViewTestCase::testAppendLabel
2 assertions passed.
View Full Test Results (1 Failed · 15 Passed)

Event Timeline

aklapper requested review of this revision.Apr 4 2024, 21:20

Bonus point: maybe instead of "#123" we can use "#L123" since both GitLab and GitHub use that (and that's not a stupid specification). Just because we can.

https://gitlab.com/ItalianLinuxSociety/ilsmanager/-/blob/master/README.md?ref_type=heads&plain=1#L9

https://github.com/phorgeit/phorge/blob/master/README.md?plain=1#L9

My last comment is also because the W3C does not allow to start an identifier with a number:

https://www.w3.org/TR/CSS21/syndata.html#value-def-identifier

they cannot start with a digit, two hyphens, or a hyphen followed by a digit.

That is probably the strong reason behind the wide adoption of #L123 - outside Phorge

valerio.bozzolan added inline comments.
src/view/layout/PhabricatorSourceCodeView.php
135–147

Maybe better to have L123 ids as already mentioned, to strictly follow W3C specifications.

So, a commodity variable $line_id may be useful

webroot/rsrc/js/core/behavior-line-linker.js
224–228
251–279

Something like this to highlight at startup

This revision now requires changes to proceed.Jul 15 2024, 06:00
webroot/rsrc/js/core/behavior-line-linker.js
224–228

At this point this line should also be removed:

path = path + '#' + o;

Can I help to adopt #L123?

Sure. I lost track on this issue anyway. :(

Uh, I've found some time to have #L123-124 working too! 🎉 wait for it

  • implement 'L' prefixed lines to make W3C happy
  • implement support for 'L1-999' intervals so we don't need '$' there
  • implement startup support, so at startup the '#L123' works

It works. Waiting for more +1 since I should not self-approve lol.

Unclear if I should do a commander revision. I sincerely just introduced an "L", plus, added robust JS parser "API" for andre but I would never have arrived here without aklapper.

This revision is now accepted and ready to land.Mon, Jun 9, 20:23

Please just @aklapper click on all comments of mine and say "done" since I can't lol (T15332)

git rebase master to get the new mailkey fancy thing

valerio.bozzolan edited reviewers, added: aklapper; removed: valerio.bozzolan.

Commanding. So others must approve.

This revision now requires review to proceed.Mon, Jun 9, 20:39

improve JavaScript DOC to mention '@throws' and when

valerio.bozzolan retitled this revision from Avoid separate per-line URIs in line number column to Avoid to generate a permalink for every clicked line number: migrate to web fragments.Mon, Jun 9, 21:16
valerio.bozzolan edited the summary of this revision. (Show Details)
valerio.bozzolan edited the test plan for this revision. (Show Details)
valerio.bozzolan edited the summary of this revision. (Show Details)

OK added last Test Plan about nonsense stuff like http://phorge.localhost/P1#LMIAAAO - I've concluded the test plan ideas. Seems to work.

mainframe98 subscribed.

Is throwing an error in the JavaScript really necessary? Regardless, this works as advertised.

This revision is now accepted and ready to land.Tue, Jun 10, 14:57

Is throwing an error in the JavaScript really necessary? Regardless, this works as advertised.

Love this question. In general I don't like functions that return something nonsense. Exceptions are generally our friends to be able to only return something meaningful and guarantee a function contract: given a valid input domain, we only returns valid result.

This allows to add a waterfall validation and terminate a function as soon as possible if there is something bad in their input domain, without the need to add a forest of return false; or return null; or return undefined; to be keep consistent, and without the need to add "if(result) then" here and there. The ability to stop violently is unrelevant for parseMinMaxSelectedLineFromFragment() since it could just return, but it's very relevant for its inner function _parseLineNumber() that cannot just return, or you must add extra code to identify that return value etc.

Advantages:

  • easier troubleshooting, just put console.log(ex) and no need to have a JavaScript debugger
  • shorter code, avoided at least 2-4 if-then
  • possibility to introduce future unit tests to catch exact crash

Disadvantages:

  • JavaScript is really dumb and does not easily allow to catch only our specific validation exceptions. So we just "catch everything" in these lines and I don't like that we may suppress more than it should.
    • Anyway it's a don't care problem, since Phorge should continue to work if an unexpected exception would happen while auto-scrolling
  • We don't really value the text of these exceptions since these are suppressed in the caller.
    • Probably also a don't care problem since being able to manage these exceptions is still VEEERY useful during development / debugging / expanding. The stack trace is invaluable.
    • e.g. I can run in my console parseMinMaxSelectedLineFromFragment("stuff") and have very clear crash evidence, without any need to debug a highway of breadcrumbs. Love it.

So I would say, ways more small advantages, but I could re-introduce a forest of if(something) { return undefined;} or whatever if needed (but longer code and more difficult to debug)

Ah, that makes perfect sense. Thank you.

valerio.bozzolan edited the test plan for this revision. (Show Details)