(Note: this was moved to a task from this Ponder question )
I'm trying to import a (really old…) Mercurial repository in 'observe' mode into a recent Phorge (cloned a week ago). The repository is large (~175k commits), and 60 of those commits fail to import. Not a bad ratio, but I still would like to get the repository into a 'fully imported' state. I know I can manually set it to 'imported', but I'm not sure what the consequences of that are, whether I should expect missing files etc. So, it would be optimal to get those remaining 60 commits to import.
Under /daemon/, I can see PhabricatorRepositoryMercurialCommitChangeParserWorker jobs with high failure counts. This is what the repository tool tells me:
root@ec6149cd0a0b:/var/www/phorge/phorge# ./bin/repository importing R3 R3:f60bb794a270 Change, Publish R3:5eeae9954ae8 Change, Publish R3:773043ec1398 Change, Publish […]
They are all standing on Change, Publish. Let's see what phd log says:
Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #13 PhutilDaemon::execute() called at [<phorge>/scripts/daemon/exec/exec_daemon.php:131] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] [2023-04-24 14:14:05] EXCEPTION: (PhutilProxyException) Error while executing Task ID 644343. {>} (AphrontCharacterSetQueryException) Attempting to construct a query using a non-utf8 string when utf8 is expected. Use the `%B` conversion to escape binary strings data. at [<phorge>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:418] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] arcanist(head=master, ref.master=08dfffd5caf7), phorge(head=master, ref.master=b587865ce78a) Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #0 <#2> AphrontBaseMySQLDatabaseConnection::validateUTF8String(string) called at [<phorge>/src/infrastructure/storage/connection/mysql/AphrontMySQLiDatabaseConnection.php:12] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #1 <#2> AphrontMySQLiDatabaseConnection::escapeUTF8String(string) called at [<phorge>/src/infrastructure/storage/xsprintf/qsprintf.php:266] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #2 <#2> xsprintf_query(array, string, integer, string, integer) called at [<arcanist>/src/xsprintf/xsprintf.php:82] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #3 <#2> xsprintf(string, array, array) called at [<phorge>/src/infrastructure/storage/xsprintf/PhutilQueryString.php:31] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #4 <#2> PhutilQueryString::__construct(AphrontMySQLiDatabaseConnection, array) called at [<phorge>/src/infrastructure/storage/xsprintf/qsprintf.php:78] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #5 <#2> qsprintf(AphrontMySQLiDatabaseConnection, string, string, string) called at [<phorge>/src/applications/repository/worker/commitchangeparser/PhabricatorRepositoryCommitChangeParserWorker.php:69] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #6 <#2> PhabricatorRepositoryCommitChangeParserWorker::lookupOrCreatePaths(array) called at [<phorge>/src/applications/repository/worker/commitchangeparser/PhabricatorRepositoryMercurialCommitChangeParserWorker.php:255] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #7 <#2> PhabricatorRepositoryMercurialCommitChangeParserWorker::parseCommitChanges(PhabricatorRepository, PhabricatorRepositoryCommit) called at [<phorge>/src/applications/repository/worker/commitchangeparser/PhabricatorRepositoryCommitChangeParserWorker.php:36] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #8 <#2> PhabricatorRepositoryCommitChangeParserWorker::parseCommit(PhabricatorRepository, PhabricatorRepositoryCommit) called at [<phorge>/src/applications/repository/worker/PhabricatorRepositoryCommitParserWorker.php:72] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #9 <#2> PhabricatorRepositoryCommitParserWorker::doWork() called at [<phorge>/src/infrastructure/daemon/workers/PhabricatorWorker.php:124] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #10 <#2> PhabricatorWorker::executeTask() called at [<phorge>/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:160] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #11 <#2> PhabricatorWorkerActiveTask::executeTask() called at [<phorge>/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php:22] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #12 PhabricatorTaskmasterDaemon::run() called at [<phorge>/src/infrastructure/daemon/PhutilDaemon.php:219] Daemon 26 STDE [Mon, 24 Apr 2023 14:14:05 +0000] #13 PhutilDaemon::execute() called at [<phorge>/scripts/daemon/exec/exec_daemon.php:131] Daemon 26 FAIL [Mon, 24 Apr 2023 14:14:05 +0000] Process exited with error 255.
To my untrained eye this looks like it's trying to create a file path in the database and chokes on the encoding of the path. In fact, when I look at the problematic commits in my repository, I see that they contain files with German "Umlaute" ("ü", "ö", "ä", …) in their file names. I did not verify this for all 60 failed commits, but picked a couple of them at random. The commits are mostly from 1999 (did I mention the repo is old?). I assume that these file names are encoded in CP-1252, because this was Windows in the year 1999.
I'm not sure how to reproduce this with a test repository, because I'm honestly not sure how to create a non-utf-8 filename and force it into Mercurial in the year 2023. However, I see nothing else special about these changesets.