Page MenuHomePhorge

File reference not removed when file is deleted
Closed, ResolvedPublic

Description

Description of the problem:

Sometime, pages like D25018 (but not limited to Differential diffs) have a bunch of "Unknown Object" in the Referenced Files section in the "right sidebar".

Example:

Phorge error Unknown objects.png (624×340 px, 46 KB)

Why?

Probably because they are expired temporarily generated files attached to the diff, but now these files were deleted.

Why?

I assume these references should be deleted since the file "attachment" reference indicates whenever a file is visible to an object (an object may be whatever, like a task).

Why?

The general idea underlying the "attachment" system is that you can have a file with minimal privileges, and a task with specific visibility privileges, and you can attach the file to the task, so that task viewers can also see the file, and you don't have to keep the view privileges in manual sync.

The database schema phabricator_file.file_attachment so contains this reference between <filePHID, objectPHID>. See the schema:

$ mysql phabricator_file
$ SHOW CREATE TABLE file_attachment;
CREATE TABLE `file_attachment` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `objectPHID` varbinary(64) NOT NULL,
  `filePHID` varbinary(64) NOT NULL,
  `attacherPHID` varbinary(64) DEFAULT NULL,
  `attachmentMode` varchar(32) NOT NULL,
  `dateCreated` int(10) unsigned NOT NULL,
  `dateModified` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `key_object` (`objectPHID`,`filePHID`),
  KEY `key_file` (`filePHID`)
)

So it's quite self-evident that, if the related file is removed, the entry in the file_attachment table should be removed as well.

Also, if the object related to the objectPHID is deleted, there is no need to keep this orphan entry in the database.

Event Timeline

I'll try to force the GC to run, maybe this will delete them.

In T15110#2655, @avivey wrote:

I'll try to force the GC to run, maybe this will delete them.

Tried that, bad refs are still there.

Relevant old upstream comment (from an unrelated task) that describes this problem as "ghosts in the UI":

  • Destroying Files: See PHI2217. Destroying an object doesn't destroy corresponding FileAttachment objects, so you can end up with ghosts in the UI.

A bit more than Normal, since it reflects on a database with orphan elements that creates "ghosts in the UI".

Unrelated but @aklapper I'm curious about the situation in WMF. If you can maybe report something like this:

USE phabricator_file;
SELECT COUNT(*) FROM FROM file_attachment
  WHERE NOT EXISTS
  (SELECT *
    FROM file
    WHERE phid=file_attachment.filePHID);

It would be super-interesting. Also with a bit of additional context:

EXPLAIN SELECT COUNT(*) FROM phabricator_file.file_attachment;

I'd be worried about DB performance running such a query as our DB has dozens of millions of files... (This is basically SELECT COUNT(*) FROM file_attachment fa WHERE fa.filePHID NOT IN (SELECT f.phid FROM file); in my understanding?)

Yep. In all cases it's at least a full-table-scan on file_attachment. Probably a bit more RAM-efficient with the NOT EXISTS thing since it does not need to have all file PHIDs in memory.