Another takeaway from the Crash Of May:
- The daemons logs are kept in a DB table
- there's a GC loop that runs once in a while and cleans up a limited number of DB rows (limited in order to not block everything else on the system. It runs in a daemon.
- A problem in Nuance included a retry, and that caused more logs to be written with every daemon loop than were being collected.
- This eventually ate all the disk-space on the DB and killed the install.
The original reason for these logs to be in db is (I assume) to simplify installs - daemons already have access to db, and they might not have access to write anything else. However, this decision was taken about 15 years ago, and maybe things have changed since?
A simple patch moving forward would be to have some alert exposed (Setup Warning) for very large tables, or very large log table, or low-space-on-db-host (is that something we can query the mysql for?).
We can also expose some data from the GC loop - GC engines always report back if they think there's more work to do at the end of a cycle, and if it's been consistently true for a long time, there's probably a problem.
On a larger scale, we can reconsider how we store the Daemon logs.