Page MenuHomePhorge

Replace function utf8_decode() - deprecated since PHP 8.2

Authored by valerio.bozzolan on Mar 24 2023, 14:31.
Referenced Files
F3318145: D25092.1743275161.diff
Fri, Mar 28, 19:06
F3313129: D25092.1743214816.diff
Fri, Mar 28, 02:20
F3308033: D25092.1743158533.diff
Thu, Mar 27, 10:42
F3307522: D25092.1743148178.diff
Thu, Mar 27, 07:49
F3306518: D25092.1743129359.diff
Thu, Mar 27, 02:35
F3306055: D25092.1743123749.diff
Thu, Mar 27, 01:02
F3305531: D25092.1743116192.diff
Wed, Mar 26, 22:56
F3304931: D25092.1743107840.diff
Wed, Mar 26, 20:37



The function utf8_decode() was a shortcut to convert strings
encoded from UTF-8 to ISO-8859-1 ("Latin 1").

This function was deprecated since PHP 8.2 and will be dropped
in PHP 9:

As mentioned in the RFC, if a $string is a valid UTF-8 string,
so this could be used to count the number of code points:


It works because any unmappable code point is replaced with the
single byte '?' in the output. But, the correct native approach
should be this one:

mb_strlen($string, 'UTF-8');

Also, another good approach is this one:

iconv_strlen($string, 'UTF-8')

Note that mb_strlen() was introduced in PHP 4, so, there
are no compatibility issues in using that.

Note that the mbstring extension is already required in the installation
documentation, so this should not change anything for any person.

Closes T15188

Test Plan
  • I was able to execute "arc lint" from PHP 8.2
  • I was able to execute this "arc diff" from PHP 8.2
  • With this patch you can still run "arc lint" with your local version

Diff Detail

rARC Arcanist
Lint Passed
Tests Passed
Build Status
Buildable 171
Build 171: arc lint + arc unit

Event Timeline

adopt mb_strlen() that is optimized to do exactly this if you tell that you want an UTF-8

valerio.bozzolan edited the test plan for this revision. (Show Details)

Interestingly, this modification also brings a performance improvement on calculating the length of multiple strings.

Example test:


// the Arabic (Hello) string below is: 59 bytes and 32 characters
$string = "السلام علیکم ورحمة الله وبرکاته!";

$t = 100000;

$start_time = microtime(TRUE);
for($i=0; $i <$t; $i++) {
        $n = mb_strlen($string, 'UTF-8');
$end_time = microtime(TRUE);
echo $end_time - $start_time . "\n";

$start_time = microtime(TRUE);
for($i=0; $i <$t; $i++) {
        $n = strlen(utf8_decode($string) );
$end_time = microtime(TRUE);
echo $end_time - $start_time . "\n";

On my computer I get:

# new way (shorter is better):

# old way:

This means less CPU cycles and, for environmental lovers, this also means less waste of resources in long terms. Anyway, a single burp for me is probably able to cancel this environmental benefit. So I will try to restrain myself.

This revision is now accepted and ready to land.Mar 25 2023, 09:47
remote: This push was rejected by Herald push rule H8.
remote:     Change: commit/
remote:       Rule: Guard Arcanist Repo with Blessed Committers
remote:     Reason: Commit is not approved by Blessed Committers
remote: Transcript:

remote: This push was rejected by Herald push rule H8.
remote:     Change: commit/
remote:       Rule: Guard Arcanist Repo with Blessed Committers
remote:     Reason: Commit is not approved by Blessed Committers
remote: Transcript:

There's too many negatives in that conditions. I'm going to need some paper...