Page MenuHomePhorge

Remarkup: make less internal links open in new tabs
ClosedPublic

Authored by valerio.bozzolan on Apr 7 2023, 21:14.

Details

Summary

This is an attempt to improve the default behavior in Remarkup about
links. It does not change any behaviors manually specified in the engine
and it does not change any behaviors related to external domains.

As default, now these kind of links will open in the same tab:

  • anchors
  • relative URLs
  • absolute URLs pointing to the base-URI domain

All the other cases are kept as before - so they open in another tab.

In short, assuming you are we.phorge.it, here the changes:

https://gnu.orgChange Log#anchorhttps://we.phorge.it//config/
Beforeexternalinternalinternalexternalexternal
Afterexternalinternalinternalinternalinternal

This situation can further improve but it already covers most of the
cases where most users do not expect to break their navigation into
several tabs. Moreover, if an user wants to open a link in another
window, no one prevents from using the middle mouse button,
or CTRL+click or any other nice really basic feature from their browser.

Also, this change introduces a new CSS class, allowing web designers
to style these external resources.

Example CSS rule to try:

.remarkup-link-ext::before {
    content: "[external] ";
}

Closes T15161
Closes T15182

Test Plan
  • Copy the example text from this Task: https://we.phorge.it/T15161
  • Verify that "internal resources" are internal links as default now
  • Verify that "external resources" are still external links as before

Diff Detail

Repository
rP Phorge
Branch
arcpatch-D25118_2
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 1113
Build 1113: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
src/infrastructure/markup/markuprule/PhutilRemarkupDocumentLinkRule.php
79

Here I just changed the false to $is_internal

src/infrastructure/markup/markuprule/PhutilRemarkupHyperlinkRule.php
119

Here I just changed the false to $is_internal

valerio.bozzolan retitled this revision from Remarkup: reduce number of internal resources opening as external links to Remarkup: reduce internal resources opening as external links.Jun 29 2023, 13:19

Tip for my reviewer: please use the "makeitso" meme

src/infrastructure/markup/markuprule/PhutilRemarkupHyperlinkRule.php
128–133

If you know a better way to concatenate two CSS classes, feel free to propose.

avivey retitled this revision from Remarkup: reduce internal resources opening as external links to Remarkup: make less internal links open in new tabs.Jul 17 2023, 06:26
avivey requested changes to this revision.Jul 17 2023, 06:45
avivey subscribed.
avivey added inline comments.
src/infrastructure/markup/markuprule/PhutilRemarkupDocumentLinkRule.php
79–86

This comment is redundant (and long). Remove it please.

src/infrastructure/markup/markuprule/PhutilRemarkupHyperlinkRule.php
128–133

This is the right way, but you don't need so many variables. Feel free to reuse $classes.

src/infrastructure/markup/markuprule/PhutilRemarkupRule.php
109–124

Lets move all of these new methods to a new class (and then make them non-static).

Also maybe make this a function of PhutilURI.

150

yeah, ./ and ../ should also be considered internal (or possibly not allowed, for security reasons).

184

Either use this method in all new cases where it applies (all new methods), or remove it.

This revision now requires changes to proceed.Jul 17 2023, 06:45
valerio.bozzolan marked 4 inline comments as done.
  • use a single variable $classes
  • remove unuseful comment
  • recognize ./ and ../ as internal
  • remove isURIStartingWith()
src/infrastructure/markup/markuprule/PhutilRemarkupRule.php
109–124

I understand. Two things:

I think somewhere we already had this discussion and you suggested that not all URIs have an "is internal" concept.

This may suggest to create a PhutilURIWeb class with these additional methods.

In any case: touching PhutilURI or creating a PhutilURIWeb may be overkill changes since they are changes from another repo.

Or, we can create this (or similar) new class in /phabricator to be less impactful, premising that a path and a name suggestion would be awesome.

150

Added. But the parser modification to deny these deserves a dedicated change.

remove unuseful newline - sorry

I think we should avoid to extend PhutilURI for some interesting reasons:

  1. Arcanist should not try to access PhabricatorEnv to look for things like phabricator.base-uri

Other side reasons:

  1. PhutilURI parses the URL. This seems great, but, we don't need that extra full-logic here that could have additional performance impact on the Remarkup parser. It parses the protocol, the user, the password, if it's git, ... maybe too overkill here.
  2. example in PhutilRemarkupDocumentLinkRule it just needs a small function to check if it starts with # etc. and it really does not need to parse a full URL with a potential fragment

Having said that,

If you would like to avoid static method, I can propose to make these methods as non-static. This would make the Remarkup classes without static calls, but will make the unit test a bit more weird, since we need to instantiate a PhutilRemarkupRule() to just call methods that can be static.

I've done my best to implement all proposed improvements

Maybe we can start removing the red hard block

speck requested changes to this revision.Oct 25 2023, 21:16
speck added inline comments.
src/infrastructure/markup/markuprule/PhutilRemarkupRule.php
150
  1. I think it would be better to not include ./ and ../ here. Including these here reads as though we should anticipate or expect these values and I think they should not be anticipated. If we come across a path with these then having the behavior of opening in a new tab vs. same tab is fine.
  2. I think determining a relative path is much more difficult, as both /relativepath and relativepath are considered relative. See this StackOverflow post though I do not think we should just use whatever regular expression accepted as answer there. Refer to PhutilURI.php in arcanist repository -- I think whatever logic is added/updated here should align with that class' logic. We might instead want to check that the URI doesn't have a protocol/scheme a la whatever:// and assume it's relative if omitted -- and when rendering the URI in HTML ensure we add the base URI.
This revision now requires changes to proceed.Oct 25 2023, 21:16
src/infrastructure/markup/markuprule/PhutilRemarkupRule.php
155–167

Something to make note of, if base-uri is https://we.phorge.it then someone making a link http://we.phorge.it/... would be considered external. I think that's fine (and considered "correct"), though we may want to document here that we're aware of it (and make note in the release notes).

There isn't really a way to know that an install would have http routed to https even though that's likely the case in most setups (without attempting to resolve/contact the URI which we absolutely do not want to do).

I don't have much of an attention span these days, sorry.

Integrate some tips. Clarify that this is just a compromise between performance and a good default, only to be used for target="_blank" purposes, to don't slow down Remarkup.

Create a dedicated method getRemarkupLinkClass() to write that once

This looks good and I really like the idea of being able to customize the style of external links. Just one tweak to the logic I think we should add before landing.

src/infrastructure/markup/markuprule/__tests__/PhutilRemarkupRuleTestCase.php
9 ↗(On Diff #1433)

I think we allow relative urls in remarkup that don’t start with a slash, and we should also consider as internal. Both /foo/ and foo/ are allowed and will resolve to internal urls in remarkup - I remember because they also happen to be handled slightly differently with other remarkup formatting, or once upon a time they were.

valerio.bozzolan added inline comments.
src/infrastructure/markup/markuprule/__tests__/PhutilRemarkupRuleTestCase.php
9 ↗(On Diff #1433)

OK I think that this is the kind of proposal that would indirectly suggest to refactor this a bit and introduce a PhutilURIGoodie or something similar that extends PhutilURI. Any tip about the name and the place to store that class?

  • add dedicated class to handle things
  • add interesting test cases
  • ready for dinner

fix PHPDoc (last version v2.4_final3_really51.zip)

I forgot I had this requesting changes. Any idea of the performance hit in markup rendering now that PhutilURI is constructed for every link? I think its constructor does a fair amount.

Also instead of “Goodie” maybe “Helper” or something. Maybe there’s a similar naming elsewhere in the code base

Some possibilities:

  1. PhutilURILocal (more natural)
  2. PhutilLocalURI (like PhutilGitURI)
  3. PhorgeURI (why not, it's about an URI with Phorge context after all)
valerio.bozzolan added inline comments.
src/infrastructure/parser/PhutilURIGoodie.php
83 ↗(On Diff #1589)

Maybe rename from isInternal to isSelfURI and just return PhabricatorEnv::isSelfURI($uri_string)

adopt PhabricatorEnv::isSelfURI()

minor optimization and unit tests are still happy

In D25118#14541, @speck wrote:

I forgot I had this requesting changes. Any idea of the performance hit in markup rendering now that PhutilURI is constructed for every link? I think its constructor does a fair amount.

I've done some micro-optimizations and done some A/B performance tests on my old toaster with Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, and plotted with LibreOffice:

D25118-boz-performance-results.png (300×1 px, 52 KB)

In master the min(single-parse time) seems 0.00058781 seconds.

In this patch D25118 the min(single-parse time) seems 0.00065650 seconds.

So it seems, for each parse out of cache, this introduces +0.00006869 seconds slowdown. That's probably under control and reasonable for such additional feature.

I would say that this change is completely irrelevant with normal use cases, since a Phorge page never has more than 1000 Remarkup fields in a single page rendering, I think.

Here scripts and raw results:

<?php

require 'scripts/init/init-script.php';

function buildNewTestEngine() {
    $engine = new PhutilRemarkupEngine();

    $engine->setConfig(
      'uri.allowed-protocols',
      array(
        'http' => true,
        'mailto' => true,
        'tel' => true,
      ));

    $rules = array();
    $rules[] = new PhutilRemarkupEscapeRemarkupRule();
    $rules[] = new PhutilRemarkupMonospaceRule();
    $rules[] = new PhutilRemarkupDocumentLinkRule();
    $rules[] = new PhutilRemarkupHyperlinkRule();
    $rules[] = new PhutilRemarkupBoldRule();
    $rules[] = new PhutilRemarkupItalicRule();
    $rules[] = new PhutilRemarkupDelRule();
    $rules[] = new PhutilRemarkupUnderlineRule();
    $rules[] = new PhutilRemarkupHighlightRule();

    $blocks = array();
    $blocks[] = new PhutilRemarkupQuotesBlockRule();
    $blocks[] = new PhutilRemarkupReplyBlockRule();
    $blocks[] = new PhutilRemarkupHeaderBlockRule();
    $blocks[] = new PhutilRemarkupHorizontalRuleBlockRule();
    $blocks[] = new PhutilRemarkupCodeBlockRule();
    $blocks[] = new PhutilRemarkupLiteralBlockRule();
    $blocks[] = new PhutilRemarkupNoteBlockRule();
    $blocks[] = new PhutilRemarkupTableBlockRule();
    $blocks[] = new PhutilRemarkupSimpleTableBlockRule();
    $blocks[] = new PhutilRemarkupDefaultBlockRule();
    $blocks[] = new PhutilRemarkupListBlockRule();
    $blocks[] = new PhutilRemarkupInterpreterBlockRule();

    foreach ($blocks as $block) {
      if (!($block instanceof PhutilRemarkupCodeBlockRule)) {
        $block->setMarkupRules($rules);
      }
    }

    $engine->setBlockRules($blocks);

    return $engine;
}

$input_remarkup = <<<EOF
== Links that should NOT open in an external tab as default ==

http://phorge.localhost/phriction/edit/3/

<http://phorge.localhost/#asd>

<http://phorge.localhost/phriction/edit/3/>

[[ http://phorge.localhost/phriction/edit/3/ | http://phorge.localhost/phriction/edit/3/ ]]

[[ /phriction/edit/3/ | /phriction/edit/3/ ]]

== Still internal links as default ==

[[ / ]]

[[ #asd | #asd ]]

== Still External Links as default ==

https://google.com/

https://google.com/#asd

<https://google.com/>

<https://google.com/#asd>

[[ https://google.com/| https://google.com/ ]]

[[ https://google.com/#asd| https://google.com/#asd ]]
EOF;

for($step = 0; $step < 150; $step++) {
	$n = 100 * ( $step + 1 );
	$start = microtime(true);
	for($i = 0; $i < $n; $i++) {
		$engine = buildNewTestEngine();
		$text = (string)$engine->markupText($input_remarkup);
	}
	$stop = microtime(true);
	sleep(1);
	$diff = $stop - $start;
	echo "$n,$diff\n";
}

Usage:

git checkout master
php ./test.php > master.csv
git checkout arcpatch-D25118
php ./test.php > D25118.csv

Minor notes:

The current implementation does not even need to build any PhutilURI if we receive /stuff or #stuff since they are short-circuits now.

I also think this is even a bit more readable than the original source code.

Thanks again @speck. This is probably useful to answer your performance question:

https://we.phorge.it/D25118#15960

Flagging all comments as done (but feel free to share other things)

This revision is now accepted and ready to land.Jul 27 2024, 15:31

git rebase master (just because we can)