R1:a1d9a2389db4
R1:a1d9a2389db4
Improve Ferret engine indexing performance for large blocks of text
Summary:
See PHI87. Ref T12974. Currently, we do a lot more work here than we need to: we call `phutil_utf8_strtolower()` on each token, but can do it once at the beginning on the whole block.
Additionally, since ngrams don't care about order, we only need to convert unique tokens into ngrams. This saves us some `phutil_utf8v()`. These calls can be slow for large inputs.
Test Plan:
- Created a ~4MB task description.
- Ran…
Summary:
See PHI87. Ref T12974. Currently, we do a lot more work here than we need to: we call `phutil_utf8_strtolower()` on each token, but can do it once at the beginning on the whole block.
Additionally, since ngrams don't care about order, we only need to convert unique tokens into ngrams. This saves us some `phutil_utf8v()`. These calls can be slow for large inputs.
Test Plan:
- Created a ~4MB task description.
- Ran…
Repository: R1 hydra
Commit Date: Sep 27 2017