php-fpm memory leak
Event Timeline
Are you importing some huge git repo or something? I haven't seen a lot of OOM with php-fpm on phorge/phabricator. It's generally not been an issue in the past so this is either a new bug or something specific to the environment. Then again, I've almost always ran Phorge with much more than 8GB of ram available.
Appears to be internal monitoring software causing this (not phorge), thanks for the replies.
OK, not the monitoring issue.
Looks to be the git repository updates running regularly. php-fpm will go from 400MB to 7GB in a few minutes - and then eventually release it's memory and act normally. There are usually two php-fpm processes using excessive memory.
Possibly there's a rather large git repository with many files triggering this but I don't see any errors in the logs. We have about 100+ repositories but only 40 repos active every day.
I can't see a way to reduce the repository update frequency, appears to be 15 seconds?
Current memory usage when memory issue is active:
total used free shared buff/cache available Mem: 15Gi 15Gi 135Mi 38Mi 204Mi 53Mi Swap: 8.0Gi 6.6Gi 1.4Gi
processes
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2432682 apache 20 0 8317756 6.1g 17144 D 0.0 39.4 0:22.46 php-fpm 2433124 apache 20 0 4399712 3.5g 17208 D 0.3 22.6 0:09.76 php-fpm 2432876 apache 20 0 8198256 3.3g 17040 D 0.0 21.3 0:23.17 php-fpm 986 mysql 20 0 5007228 946164 0 S 0.3 5.9 128:10.96 mysqld 2440767 phab-vcs 20 0 467244 34752 16604 D 0.0 0.2 0:00.13 php 2440743 phab-vcs 20 0 467244 34456 16304 D 0.0 0.2 0:00.14 php 2440755 phab-vcs 20 0 465160 33872 16420 D 0.0 0.2 0:00.12 php 2440678 phab-da+ 20 0 465108 33848 16448 D 0.0 0.2 0:00.13 php 2440768 phab-vcs 20 0 465108 33472 16292 D 0.0 0.2 0:00.13 php 2440676 phab-da+ 20 0 465108 32956 16448 D 0.0 0.2 0:00.13 php 2440750 phab-vcs 20 0 463064 32924 16676 D 0.0 0.2 0:00.11 php 2440764 phab-vcs 20 0 465108 32712 16360 D 0.0 0.2 0:00.13 php 2440751 phab-vcs 20 0 462916 31968 16340 D 0.0 0.2 0:00.12 php 2440744 phab-vcs 20 0 460868 29960 16416 D 0.3 0.2 0:00.11 php 2440659 phab-da+ 20 0 471456 28760 8280 S 0.0 0.2 0:00.35 php 2432558 apache 20 0 648272 28548 19212 S 0.0 0.2 0:01.13 php-fpm 2432781 apache 20 0 648380 27816 18900 S 0.0 0.2 0:01.18 php-fpm 2440628 phab-da+ 20 0 471456 26064 3788 S 0.0 0.2 0:00.27 php 2440741 phab-vcs 20 0 456772 25220 14804 S 0.0 0.2 0:00.07 php 2440758 phab-vcs 20 0 456772 24968 16600 S 0.3 0.2 0:00.08 php 2440766 phab-vcs 20 0 456772 24872 16504 S 0.0 0.2 0:00.08 php 2440753 phab-vcs 20 0 456772 24868 16504 S 0.0 0.2 0:00.07 php 2440759 phab-vcs 20 0 456772 24780 16416 S 0.0 0.2 0:00.06 php 2440763 phab-vcs 20 0 456772 24744 16376 S 0.0 0.2 0:00.08 php
When the memory is normal it is as follows, example:
Mem: 15Gi 1.3Gi 13Gi 39Mi 406Mi 13Gi Swap: 8.0Gi 2.1Gi 5.9Gi PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 986 mysql 20 0 5007228 868488 0 S 0.7 5.4 128:31.36 mysqld 2432558 apache 20 0 648488 36988 27260 S 0.0 0.2 0:02.15 php-fpm 2432781 apache 20 0 648572 36888 27040 S 0.0 0.2 0:01.80 php-fpm 2441815 apache 20 0 647708 28160 18240 S 0.0 0.2 0:00.13 php-fpm 2443599 apache 20 0 647708 28152 18240 S 0.0 0.2 0:00.08 php-fpm 2441476 apache 20 0 647708 27932 18220 S 0.0 0.2 0:00.17 php-fpm 549 root 20 0 299028 17464 14932 S 0.0 0.1 1:09.37 systemd-journal 2214372 phab-ap+ 20 0 612524 15116 9540 S 0.0 0.1 0:33.03 node 2232840 phab-da+ 20 0 469352 13348 3780 S 0.3 0.1 3:00.29 php 1330 root 10 -10 57484 12260 10680 S 0.0 0.1 0:00.00 iscsid 1472613 apache 20 0 2718556 8148 4300 S 0.0 0.1 3:05.98 httpd 2214389 phab-da+ 20 0 469408 8072 4108 S 0.0 0.1 0:32.91 php 1476317 apache 20 0 2587396 7680 516 S 0.0 0.0 3:37.27 httpd 1353 root 20 0 268432 6808 6156 S 0.0 0.0 1:00.17 httpd 1332 root 20 0 398000 6344 5812 S 0.0 0.0 1:31.08 rsyslogd 1472614 apache 20 0 2521860 6116 4528 S 0.0 0.0 2:47.10 httpd 2429315 root 20 0 626732 6056 3620 S 0.0 0.0 0:00.24 php-fpm
Repository update is based on how recent the last commit is, but that's only for collecting the list of commits to analyze.
But that should be done in the Daemons, which should at least recover nicely and not break the web side of the application. I don't know if they should be running as php or php-fpm.
bin/phd status should at least show the PID and the name of the daemon, so that would be a pointer. Most of these operations can also be limited or done with a TTY, for debugging.
What is the normal memory usage for the update daemon during the repository update? It seems to be peaking around 8GB per process for our repositories. Maybe I need to add more RAM, i.e. 32 GB to satisfy this, but that's not easy on AWS.
- Can you add a swap file to allow memory usage above 8gb? This will have poor performance but it may allow repo processing to continue.
- Can you tell if there is a repo which is still importing? Normal day to day repo tracking shouldn’t require tons of ram so the only situation that I can imagine needing this much ram is importing a large repository.
If it's the daemons that are having problems, you can probably see in the /daemons/ dashboard which tasks are causing problems, and narrow it down from there.
Think I may have found what is causing this. We had a few repositories that auto download/sync from github. The user's key who connects to github was missing so the downloads were giving errors. I think this triggers a memory leak for the php process. Now that I've fixed the key issue no errors now appear and phorge looks to be happier, i.e. memory usage looks minimal all the time (i.e. php-fpm processes using ~100MBytes).
I'm going to keep monitoring for a few weeks.
Here's an example from the log:
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] [2024-10-19 04:01:27] EXCEPTION: (PhutilProxyException) Error while updating the "XXXX" repository. {>} (CommandException) Command failed with error #255!
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] COMMAND
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] /var/www/phab/phabricator/bin/repository update -- rXXXX
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] STDOUT
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] (empty)
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] STDERR
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] [2024-10-19 04:01:27] EXCEPTION: (CommandException) Command failed with error #128!
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] COMMAND
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] git ls-remote '****' 'refs/*'
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] STDOUT
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] (empty)
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] STDERR
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] Warning: Permanently added 'github.com,20.26.156.215' (ECDSA) to the list of known hosts.
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] XXXX@github.com: Permission denied (publickey).
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] fatal: Could not read from remote repository.
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] Please make sure you have the correct access rights
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] and the repository exists.
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] at [<arcanist>/src/future/exec/ExecFuture.php:436]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] arcanist(head=master, ref.master=84210cedc6ce), phorge(head=master, ref.master=349f006904fa)
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #0 ExecFuture::raiseResultError(array) called at [<arcanist>/src/future/exec/ExecFuture.php:340]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #1 ExecFuture::resolvex() called at [<phorge>/src/applications/repository/storage/PhabricatorRepository.php:495]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #2 PhabricatorRepository::execxRemoteCommand(string, PhutilOpaqueEnvelope, array) called at [<phorge>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:575]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #3 PhabricatorRepositoryPullEngine... (1,044 more bytes) ... at [<arcanist>/src/future/exec/ExecFuture.php:436]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] arcanist(head=master, ref.master=84210cedc6ce), phorge(head=master, ref.master=349f006904fa)
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #0 <#3> ExecFuture::raiseResultError(array) called at [<arcanist>/src/future/exec/ExecFuture.php:340]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #1 <#3> ExecFuture::resolvex() called at [<phorge>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:496]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #2 phlog(PhutilProxyException) called at [<phorge>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:503]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #3 PhabricatorRepositoryPullLocalDaemon::resolveUpdateFuture(PhabricatorRepository, ExecFuture, integer) called at [<phorge>/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:272]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #4 PhabricatorRepositoryPullLocalDaemon::run() called at [<phorge>/src/infrastructure/daemon/PhutilDaemon.php:216]
Daemon 30319 STDE [Sat, 19 Oct 2024 04:01:27 +0100] #5 PhutilDaemon::execute() called at [<phorge>/scripts/daemon/exec/exec_daemon.php:131]