Page MenuHomePhorge

Duplicated Query search tokens also create duplicate SQL query constructs
Open, Needs TriagePublic

Description

This seems avoidable and would make SQL SELECT queries in the Global Search easier to understand/debug in Dark Console.

This may also theoretically decrease the likeliness to run T15831: Long search query can trigger raw "Too many tables; MariaDB can only use 61 tables in a join".

I'm not sure how often such user or bot input happens though and if that justifies the additional performance decrease due to additional code executed.
So feel free to decline.

Event Timeline

  1. Go to http://phorge.localhost/search/query/advanced/
  2. In the Query field enter z a a "a phrase" a a a a ~a -b =foo title:u which includes duplicate values
  3. Click Search button

Debug patch:

[acko@foo phorge (dupQuerySearchTerms $|u=)]$ git diff
diff --git a/src/applications/search/compiler/PhutilSearchQueryCompiler.php b/src/applications/search/compiler/PhutilSearchQueryCompiler.php
index 39479a4334..08acc6619e 100644
--- a/src/applications/search/compiler/PhutilSearchQueryCompiler.php
+++ b/src/applications/search/compiler/PhutilSearchQueryCompiler.php
@@ -147,9 +147,24 @@ final class PhutilSearchQueryCompiler
           new PhutilNumber($query_bytes),
           new PhutilNumber($maximum_bytes)));
     }
+phlog('DEBUG: query before: ' . $query);
+    // Remove duplicate tokens but keep quoted tokens intact thus no explode()
+    $query_pieces = preg_split('/("[^"]*")|\h+/', $query, -1,
+      PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
+    $query_pieces = array_unique($query_pieces);
+    $query_pieces = implode(" ", $query_pieces);
+phlog('DEBUG: query  after: ' . $query_pieces);
+phlog('DEBUG: query before length: ' . strlen($query));
+phlog('DEBUG: query  after length: ' . strlen($query_pieces));
 
     $query = phutil_utf8v($query);
+phlog('DEBUG: utf8v query before input: ' . json_encode($query));
+    $query_pieces = phutil_utf8v($query_pieces);
+phlog('DEBUG: utf8v query after  input: ' . json_encode($query_pieces));
     $length = count($query);
+phlog('DEBUG: utf8v query before input length: ' . $length);
+    $length_qp = count($query_pieces);
+phlog('DEBUG: utf8v query  after input length: ' . $length_qp);
 
     $enable_functions = $this->getEnableFunctions();

Debug output:

'DEBUG: query before: z a a "a phrase" a a a a ~a -b =foo title:u' at [PhutilSearchQueryCompiler.php:150]
'DEBUG: query  after: z a "a phrase" ~a -b =foo title:u' at [PhutilSearchQueryCompiler.php:156]
'DEBUG: query before length: 43' at [PhutilSearchQueryCompiler.php:157]
'DEBUG: query  after length: 33' at [PhutilSearchQueryCompiler.php:158]
'DEBUG: utf8v query before input: ["z"," ","a"," ","a"," ","\"","a"," ","p","h","r","a","s","e","\""," ","a"," ","a"," ","a"," ","a"," ","~","a"," ","-","b"," ","=","f","o","o"," ","t","i","t","l","e",":","u"]' at [PhutilSearchQueryCompiler.php:161]
'DEBUG: utf8v query after  input: ["z"," ","a"," ","\"","a"," ","p","h","r","a","s","e","\""," ","~","a"," ","-","b"," ","=","f","o","o"," ","t","i","t","l","e",":","u"]' at [PhutilSearchQueryCompiler.php:163]
'DEBUG: utf8v query before input length: 43' at [PhutilSearchQueryCompiler.php:165]
'DEBUG: utf8v query  after input length: 33' at [PhutilSearchQueryCompiler.php:167]

For the resulting SQL queries, check the return values of buildFerretSelectClause(), buildFerretJoinClause(), buildFerretWhereClause() in PhabricatorCursorPagedPolicyAwareQuery.

If somebody can share the resulting query, to compare it A/B, that would be super