Page MenuHomePhorge

Typeahead datasource query for repos should match substrings
Closed, ResolvedPublic

Description

Our company uses Phorge for many small software projects for many different customers. So we have roughly 500 repos. Our developers have to search for repos in a fast manner. They try to use the main search and hope for typeahead results, but quite often the repository is not found. The problem: the search term is only matched as a prefix of repository name, callsign or slug.

See: https://we.phorge.it/source/phorge/browse/master/src/applications/repository/query/PhabricatorRepositoryQuery.php;ad0052fcd643cb6b88453a6c8a11b5593056e154$650

Our devs often can't remember the entire name of the repo, just a succinct keyword of the title will come to mind. So my proposal: Make the name query to a substring query to find repos faster. This would align this datasource type with people, project and wiki searches, which work with substrings too.

Is this a reasonable proposal? I am open for discussion.

Event Timeline

Just to illustrate my point - here are the repositories of phorge: https://we.phorge.it/diffusion/

Search for Conduit Client or Stickers in the global search, you won't find R4 or rQR. That is counter intuitive.

No interest in this? I can send a diff to differential and be mauled there 🦁

avivey subscribed.

Yeah, totally reasonable feature request.
Do you think you can implement?
This might involve the Ferret engine (like this thing, or maybe there's a simpler approach for the query (title LIKE %text% in the Query class?).

edit: I mean, the Ferret is probably the "best" solution and will make the global search work, but it was a bit confusing the get working. this commit is titled "make global search work for <thing>", maybe it will help.

I also don't fully understand the "Phabricator way" to create full text queries still using indexes. That is my main concern.

There is no ferret involved in my fix. I have already implemented it in our company instance and the devs are loving it. The change is in this line: https://we.phorge.it/source/phorge/browse/master/src/applications/repository/query/PhabricatorRepositoryQuery.php$650

r.name LIKE %> to r.name LIKE %~

This is a reasonable change in my view, performance wise too.

Yeah, that's probably good - that's the query for typeaheads and probably global search, but not for other cases.

I guess Ferret/Fullsearch is a "more modern" implementation of this.

I have searched through the code a little bit and there is another system for typeahaed results. It is token based. So, how does it work?

  1. There is an extra token table, for example for people names: https://we.phorge.it/source/phorge/browse/master/src/applications/people/storage/PhabricatorUserSchemaSpec.php$8
  2. This table is updated every time a user is saved like this: https://we.phorge.it/source/phorge/browse/master/src/applications/people/storage/PhabricatorUser.php$495
  3. The query works with a join clause: https://we.phorge.it/source/phorge/browse/master/src/applications/people/query/PhabricatorPeopleQuery.php$225

So yeah... the right "phabricator/phorge" way would be to use tokens for repo names. So... should I take the hard way? But there would be inline sql involved as far as I can see 😓

I suspect that the code in People is the oldest one - this stuff blames to 2011! and does explicit sql stuff!

Ferret/FullTextEngine doesn't require any SQL and also supports re-creating indexes (it does require creating 2 new tables I think).

rP99c9df96b4ffbf7 (2015) is the big "convert to Full Text Search" commit, but looks like it's not about Ferret (2017?).

I suspect that the code in People is the oldest one - this stuff blames to 2011! and does explicit sql stuff!

Yeah, so no suitable way. I did go deeper in the code and have discovered, that the repository storage class itself is already ferret/fulltext ready. So maybe I can make something out of this. The wiki document typeahead search is using it too: https://we.phorge.it/source/phorge/browse/master/src/applications/phriction/typeahead/PhrictionDocumentDatasource.php

Okay, I have tried to wrap my head around the ferret engine. Problem is: the repo callsign is not part of the indexed material. So I would have to construct some query like (just a schema): WHERE ferret-query like '%term%' OR callsign like 'term%' - but that is not possible with the phabricator query engine. The ferret query parts are separatly created and every other where clause is merged with AND. I don't think, there is a clean way out of this. You could make TWO queries: The old one and one with the ferret engine. And then merge the results and remove doubles. But I don't know... just doesn't feel right.

a hack that might work: add a field in the document that's called "typeahead-text" or something, and put both title and callsign into it (and maybe number and short-name as well). Then use that field in the Datasource.
Since this is a full-text field, whatever the user types will fit this field...

Okay, my proposal:

  1. Change the fulltext engine for repos: https://we.phorge.it/source/phorge/browse/master/src/applications/repository/search/PhabricatorRepositoryFulltextEngine.php
    • add the slug and the callsign to the title field of the fulltext document (now it is only the name, but why not add these crucial field informations?)
    • remove the slug from the body field
  2. Create and apply a patch to reindex all repos.
  3. Use the ferret engine for datasource typed repo searches!

I am very sorry for my wild stream of thought. Maybe @20after4 can say something to this, because he has implemented the fulltext search feature for repos.

go for it.

There may already be a feature somewhere to automatically trigger a re-index because the code changed - I'll try to look for one this week.

edit: looks like it's just "add an autopatch" like resources/sql/autopatches/20191028.uriindex.01.rebuild.php. No worries.

go for it.

There may already be a feature somewhere to automatically trigger a re-index because the code changed - I'll try to look for one this week.

edit: looks like it's just "add an autopatch" like resources/sql/autopatches/20191028.uriindex.01.rebuild.php. No worries.

Sounds good. I can add it in my diff and test it beforehand.

bekay claimed this task.

This is now resolved.