Page MenuHomePhorge

Disallow webcrawlers to index Diffusion commits
ClosedPublic

Authored by aklapper on Nov 16 2023, 10:11.
Tags
None
Referenced Files
F2177771: D25474.diff
Sat, May 4, 12:32
Unknown Object (File)
Wed, May 1, 19:21
Unknown Object (File)
Wed, May 1, 19:21
Unknown Object (File)
Wed, May 1, 19:21
Unknown Object (File)
Tue, Apr 30, 18:44
Unknown Object (File)
Sun, Apr 28, 16:43
Unknown Object (File)
Fri, Apr 26, 20:52
Unknown Object (File)
Wed, Apr 24, 22:52
Tokens
"Evil Spooky Haunted Tree" token, awarded by valerio.bozzolan.

Details

Summary

Phorge already sets Disallow: /diffusion/ and Disallow: /source/.
Thus consequently also disallow accessing specific commits via /r*.
See https://secure.phabricator.com/T4610 for previous discussions.

Closes T15670

Test Plan

Go to /robots.txt in the web browser.
Cross fingers that more webcrawlers abide by RFC 9309.

Diff Detail

Repository
rP Phorge
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

aklapper retitled this revision from Disallow webcrawlers to follow Paste line number anchor links to Disallow webcrawlers to index Diffusion commits.Nov 16 2023, 10:12
This revision is now accepted and ready to land.Nov 16 2023, 23:27
src/applications/system/controller/robots/PhabricatorRobotsPlatformController.php
14–16

(Note this comment)

src/applications/system/controller/robots/PhabricatorRobotsPlatformController.php
14–16

Good catch, I missed this existing comment in my review. It might be worth digging into the history here to see if there’s more detail on why preventing them from being indexed is less useful or more difficult.

If someone strongly feels that I should revert, please say so - thanks! :)

I'm not an important stakeholder, but I would like to share that in my installation https://gitpull.it I would like to have commits indexed as default as it happened as default and as it happens in GitHub and GitLab. So I'm now sincerely trying to understand how to restore the old behavior without keeping my own fork of Phorge if needed.

I don’t think revert I’d needed but the comment should probably be removed or updated. I’d like to understand why it was deemed hard to do but the solution here doesn’t seem that hard. Maybe it’s more difficult than it appears, or was robots.txt standard later updated in a way that makes this easier, or maybe Phab URLs changed in a way that made this easier but this was never updated, etc.