Page MenuHomePhorge

Catch RuntimeException: mb_convert_encoding(): Illegal character encoding specified at PhabricatorTextDocumentEngine.php:73
ClosedPublic

Authored by aklapper on Aug 22 2023, 17:24.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Jul 9, 11:30
Unknown Object (File)
Tue, Jul 9, 09:39
Unknown Object (File)
Tue, Jul 9, 09:38
Unknown Object (File)
Tue, Jul 9, 07:00
Unknown Object (File)
Mon, Jul 8, 21:57
Unknown Object (File)
Mon, Jul 8, 21:55
Unknown Object (File)
Mon, Jul 8, 17:19
Unknown Object (File)
Fri, Jul 5, 08:44

Details

Summary

When given $encoding is invalid, catch the exception to show a proper error message and make the server logs provide more hints.

EXCEPTION: (RuntimeException) mb_convert_encoding(): Illegal character encoding specified at [<arcanist>/src/error/PhutilErrorHandler.php:261]
#0 <#2> PhutilErrorHandler::handleError(integer, string, string, integer, array) called at [<arcanist>/src/error/PhutilErrorHandler.php:261]
#1 <#2> mb_convert_encoding(string, string, string) called at [<phabricator>/src/applications/files/document/PhabricatorTextDocumentEngine.php:73]

Closes T15624

Test Plan

Open a URL which passes a bogus encoding value as parameter, like /source/somerepository/browse/master/README.md?as=source&encode=TROLOLOL

Diff Detail

Repository
rP Phorge
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Please do note that this patch is untested as I am unaware of reproduction steps.

Sten added inline comments.
src/applications/files/document/PhabricatorTextDocumentEngine.php
79

We get $message, but don't make use of it. Shouldn't we include $message in the log rather than. assuming it's an Illegal character encoding?

Also, should we set $this->encodingMessage to the error message?

(I share the same question from Sten)

Good points. This patch puts also the exception message into phlog.

OK the situation is a bit more clear now.

We can maybe limit the Test Plan to people requesting nonsense encoding. Example:

/source/myrepo/browse/master/README.md?as=source&encode=TROLOLOL

And, I totally agree that the try ... catch would be useful, but we can introduce something like this:

$this->encodingMessage = pht(
  'Unable to convert from the requested encoding to UTF8.');

(Note that this message does not mention the weird input parameter by design, so we can avoid to generate silly pages if the user visits the page with TROLOLOL as encoding etc.)

And, at this point, since we probably triaged the root problem, and since the error is already supposed to be reported to the corresponding user, and since system administrators already have the access logs to see weird URLs like these, feel free to evaluate whenever you want to suppress the new phlog() call to just keep the user message. Probably in your use case you want less logs to be inspected, so feel free to omit, but it's up to you.

(I will immediately hard-approve this lovely change as soon as we share whatever ACK like "Yeah let's log the planet" or "Yeah let's just show a user message")

Thanks for the additional debugging and finding steps to reproduce! In this case, I'd prefer not to log the issue and to only show a message to the user explaining what's the issue, including their requested bogus encoding.

Side note, we are introducing the possibility to share this kind of very creative and confusing URLs (that are safe from the point of view of XSS but) that could be an attracting point for lamers to generate confusing user messages inside Phorge itself, like:

/source/reponame/browse/master/README.md?as=source&encode=AHAHAH%20YOU%20ARE%20A%20LUSER%20AHAHAHAH%20%3Cscript%3Ealert(1)%3C/script%3E%20AHAHAHHAHAHA

image.png (60×594 px, 8 KB)

I'm totally not aware of potential consequences, I'm totally not aware of similar cases. I'm just somehow sure that this is somehow safe, so, I soft-approve since this works. Thanks!

Let's wait for additional eyes for extra feedback. I will hard-approve this in a week approx if we receive just silence.

(I can also surely hard-approve if we remove that user-generated output since that indeed would be 100000% blindly super-safe).

Interestingly, Gerrit does similar things, and does not mention the problematic URI value in case it's somehow already clear:

https://gerrit.wikimedia.org/g/phabricator/translations?format=TEXT

The specified format type is not supported

Nobody seems interested in my concern, so, hard approve :) Thanks

This revision is now accepted and ready to land.Nov 23 2023, 14:47