Homestyx hydra

R1:4005a465f7d0

Make Ferret indexing more robust (UTF8, exception handling)

Summary:
Ref T12819. Two minor improvements from live data:

- Tokenize in a UTF8-aware way.
- When one document fails to index, kill the transaction explicitly (rather than leaving it hanging) so we don't cause other failures later.

Test Plan: Created some UTF8 documents locally, indexed them, got clean results.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18487
Repository: R1 hydra
Commit Date: Aug 28 2017