Page MenuHomestyx hydra

No OneTemporary

diff --git a/src/docs/userguide/utf8.diviner b/src/docs/userguide/utf8.diviner
new file mode 100644
index 0000000000..fe7f91dbd9
--- /dev/null
+++ b/src/docs/userguide/utf8.diviner
@@ -0,0 +1,62 @@
+@title User Guide: UTF-8 and Character Encoding
+@group userguide
+
+How Phabricator handles character encodings.
+
+= Overview =
+
+Phabricator stores all internal text data as UTF-8, processes all text data
+as UTF-8, outputs in UTF-8, and expects all inputs to be UTF-8. Principally,
+this means that you should write your source code in UTF-8. In most cases this
+does not require you to change anything, because ASCII text is a subset of
+UTF-8.
+
+= Detecting and Repairing Files =
+
+It is recommended that you write source files only in ASCII text, but
+Phabricator fully supports UTF-8 source files. However, it won't currently do
+encoding transformation, so if you have source files which are not valid UTF-8
+you may run into issues.
+
+If you have a project which isn't valid UTF-8 because a few files have random
+binary nonsense in them, there is a script in libphutil which can help you
+identify and fix them:
+
+ project/ $ libphutil/scripts/utils/utf8.php
+
+Generally, run this script on all source files with "-t" to find files with bad
+byte ranges, and then run it without "-t" on each file to identify where there
+are problems. For example:
+
+ project/ $ find . -type f -name '*.c' -print0 | xargs -0 -n256 ./utf8 -t
+ ./hello_world.c
+
+If this script exits without output, you're in good shape and all the files that
+were identified are valid UTF-8. If it found some problems, you need to repair
+them. You can identify the specific problems by omitting the "-t" flag:
+
+ project/ $ ./utf8.php hello_world.c
+ FAIL hello_world.c
+
+ 3 main()
+ 4 {
+ 5 printf ("Hello World<0xE9><0xD6>!\n");
+ 6 }
+ 7
+
+This shows the offending bytes on line 5 (in the actual console display, they'll
+be highlighted). Often a codebase will mostly be valid UTF-8 but have a few
+scattered files that have other things in them, like curly quotes which someone
+copy-pasted from Word into a comment. In these cases, you can just manually
+identify and fix the problems pretty easily.
+
+If you have a prohibitively large number of UTF-8 issues in your source code,
+Phabricator doesn't include any default tools to help you process them in a
+systematic way. You could hack up ##utf8.php## as a starting point, or use other
+tools to batch-process your source files.
+
+NOTE: If you have a project which uses a //different encoding// for source
+files, there is no easy way to get it working with Phabricator or Arcanist right
+now. If it's not reasonable to switch to UTF-8, tell us more about your use case
+and we can evaluate supporting it. Since tools like Git don't work well with
+other encodings, the prevailing assumption is that this is a rare situation.
\ No newline at end of file

File Metadata

Mime Type
text/x-diff
Expires
Wed, Dec 3, 7:33 PM (48 m, 58 s)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
434059
Default Alt Text
(2 KB)

Event Timeline