Page MenuHomePhorge

Disallow webcrawlers to follow Paste line number anchor links
ClosedPublic

Authored by aklapper on Nov 10 2023, 11:57.

Details

Summary

Paste provides line anchor links in every single line of a paste.
If webcrawlers follow these links, they index the very same Paste again.
Thus disallow in robots.txt to reduce unneeded traffic and indexing time.

Closes T15662

Test Plan

Go to /robots.txt in the web browser.
Cross fingers that more webcrawlers abide by RFC 9309.

Diff Detail

Repository
rP Phorge
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I will keep this change in my production for a while:

https://gitpull.it/robots.txt

https://gitpull.it/P22$1

Feel free to test.

Just to say that most online tools do not work. For instance:

🔶 https://technicalseo.com/tools/robots-txt/

I see this change as safe since:

  • In the best case, URLs like /P123$123123 are just finally ignored and /P123 is still indexed
  • In the worst case, the page /P123%24 is not indexed but that is nonsense and it should not negatively impact in any way /P123

Feel free to follow the tip. Please wait at least 10 seconds before landing, so maybe we can collect more feedback

src/applications/system/controller/robots/PhabricatorRobotsPlatformController.php
27

✅ I verified that %24 is the URL encode of the dollar $

As a side note, in theory, the last * is probably not necessary.

This revision is now accepted and ready to land.Nov 11 2023, 21:45

Thanks for landing!

(As a side note, in theory, the last * was very probably not necessary)