No OneTemporary
Actions

Size

20 KB

Referenced Files

None

Subscribers

None

View Options

	diff --git a/src/docs/contributor/database.diviner b/src/docs/contributor/database.diviner
	index 59a7dc2b2c..aaea485dc6 100644
	--- a/src/docs/contributor/database.diviner
	+++ b/src/docs/contributor/database.diviner
	@@ -1,211 +1,211 @@
	@title Database Schema
	@group developer

	This document describes key components of the database schema and should answer
	questions like how to store new types of data.

	Database System
	===============

	Phabricator uses MySQL or another MySQL-compatible database (like MariaDB
	or Amazon RDS).

	-Phabricator the InnoDB table engine. The only exception is the
	+Phabricator uses the InnoDB table engine. The only exception is the
	`search_documentfield` table which uses MyISAM because MySQL doesn't support
	fulltext search in InnoDB (recent versions do, but we haven't added support
	yet).

	We are unlikely to ever support other incompatible databases like PostgreSQL or
	SQLite.

	PHP Drivers
	===========

	Phabricator supports [[ http://www.php.net/book.mysql \| MySQL ]] and
	[[ http://www.php.net/book.mysqli \| MySQLi ]] PHP extensions.

	Databases
	=========

	Each Phabricator application has its own database. The names are prefixed by
	`phabricator_` (this is configurable).

	Phabricator uses a separate database for each application. To understand why,
	see @{article:Why does Phabricator need so many databases?}.

	Connections
	===========

	Phabricator specifies if it will use any opened connection just for reading or
	also for writing. This allows opening write connections to a primary and read
	connections to a replica in primary/replica setups (which are not actually
	supported yet).

	Tables
	======

	Most table names are prefixed by their application names. For example,
	Differential revisions are stored in database `phabricator_differential` and
	table `differential_revision`. This generally makes queries easier to recognize
	and understand.

	The exception is a few tables which share the same schema over different
	databases such as `edge`.

	We use lower-case table names with words separated by underscores.

	Column Names
	============

	Phabricator uses `camelCase` names for columns. The main advantage is that they
	directly map to properties in PHP classes.

	Don't use MySQL reserved words (such as `order`) for column names.

	Data Types
	==========

	Phabricator defines a set of abstract data types (like `uint32`, `epoch`, and
	`phid`) which map to MySQL column types. The mapping depends on the MySQL
	version.

	Phabricator uses `utf8mb4` character sets where available (MySQL 5.5 or newer),
	and `binary` character sets in most other cases. The primary motivation is to
	allow 4-byte unicode characters to be stored (the `utf8` character set, which
	is more widely available, does not support them). On newer MySQL, we use
	`utf8mb4` to take advantage of improved collation rules.

	Phabricator stores dates with an `epoch` abstract data type, which maps to
	`int unsigned`. Although this makes dates less readable when browsing the
	database, it makes date and time manipulation more consistent and
	straightforward in the application.

	We don't use the `enum` data type because each change to the list of possible
	values requires altering the table (which is slow with big tables). We use
	numbers (or short strings in some cases) mapped to PHP constants instead.

	JSON and Other Serialized Data
	==============================

	Some data don't require structured access -- we don't need to filter or order by
	them. We store these data as text fields in JSON format. This approach has
	several advantages:

	- If we decide to add another unstructured field then we don't need to alter
	the table (which is slow for big tables in MySQL).
	- Table structure is not cluttered by fields which could be unused most of the
	time.

	An example of such usage can be found in column
	`differential_diffproperty.data`.

	Primary Keys
	============

	-Most tables have auto-increment column named `id`. Adding an ID column is
	+Most tables have an auto-increment column named `id`. Adding an ID column is
	appropriate for most tables (even tables that have another natural unique key),
	as it improves consistency and makes it easier to perform generic operations
	on objects.

	For example, @{class:LiskMigrationIterator} allows you to very easily apply a
	migration to a table using a constant amount of memory provided the table has
	an `id` column.

	Indexes
	======

	Create all indexes necessary for fast query execution in most cases. Don't
	create indexes which are not used. You can analyze queries @{article:Using
	DarkConsole}.

	Older MySQL versions are not able to use indexes for tuple search:
	`(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead:
	`((a = %s AND b = %d) OR (a = %s AND b = %d))`.

	Foreign Keys
	============

	We don't use foreign keys because they're complicated and we haven't experienced
	significant issues with data inconsistency that foreign keys could help prevent.
	Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships
	accidentally destroy huge amounts of data. We may pursue foreign keys
	eventually, but there isn't a strong case for them at the present time.

	PHIDs
	=====

	-Each globally referencable object in Phabricator has its associated PHID
	+Each globally referencable object in Phabricator has an associated PHID
	("Phabricator ID") which serves as a global identifier, similar to a GUID.
	We use PHIDs for referencing data in different databases.

	-We use both autoincrementing IDs and global PHIDs because each is useful in
	-different contexts. Autoincrementing IDs are meaningfully ordered and allow
	+We use both auto-incrementing IDs and global PHIDs because each is useful in
	+different contexts. Auto-incrementing IDs are meaningfully ordered and allow
	us to construct short, human-readable object names (like `D2258`) and URIs.
	Global PHIDs allow us to represent relationships between different types of
	objects in a homogeneous way.

	For example, infrastructure like "subscribers" can be implemented easily with
	PHID relationships: different types of objects (users, projects, mailing lists)
	are permitted to subscribe to different types of objects (revisions, tasks,
	etc). Without PHIDs, we would need to add a "type" column to avoid ID collision;
	using PHIDs makes implementing features like this simpler.

	Transactions
	============

	Transactional code should be written using transactions. Example of such code is
	-inserting multiple records where one doesn't make sense without the other or
	+inserting multiple records where one doesn't make sense without the other, or
	selecting data later used for update. See chapter in @{class:LiskDAO}.

	Advanced Features
	=================

	We don't use MySQL advanced features such as triggers, stored procedures or
	events because we like expressing the application logic in PHP more than in SQL.
	Some of these features (especially triggers) can also cause a great deal of
	confusion, and are generally more difficult to debug, profile, version control,
	update, and understand than application code.

	Schema Denormalization
	======================

	Phabricator uses schema denormalization sparingly. Avoid denormalization unless
	there is a compelling reason (usually, performance) to denormalize.

	Schema Changes and Migrations
	=============================

	To create a new schema change or migration:

	Create a database patch. Database patches go in
	`resources/sql/autopatches/`. To change a schema, use a `.sql` file and write
	in SQL. To perform a migration, use a `.php` file and write in PHP. Name your
	file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`.

	Keep patches small. Most schema change statements are not transactional. If
	a patch contains several SQL statements and fails partway through, it normally
	can not be rolled back. When a user tries to apply the patch again later, the
	first statement (which, for example, adds a column) may fail (because the column
	already exists). This can be avoided by keeping patches small (generally, one
	statement per patch).

	Use namespace and character set variables. When defining a `.sql` patch,
	you should use these variables instead of hard-coding namespaces or character
	set names:

	\| Variable \| Meaning \| Notes \|
	\|---\|---\|---\|
	-\| {$NAMESPACE} \| Storage Namespace \| Defaults to `phabricator` \|
	-\| {$CHARSET} \| Default Charset \| Mostly used to specify table charset \|
	-\| {$COLLATE_TEXT} \| Text Collation \| For most text (case-sensitive) \|
	-\| {$COLLATE_SORT} \| Sort Collation \| For sortable text (case-insensitive) \|
	-\| {$CHARSET_FULLTEXT} \| Fulltext Charset \| Specify explicitly for fulltext \|
	-\| {$COLLATE_FULLTEXT} \| Fulltext Collate \| Specify explicitly for fulltext \|
	+\| `{$NAMESPACE}` \| Storage Namespace \| Defaults to `phabricator` \|
	+\| `{$CHARSET}` \| Default Charset \| Mostly used to specify table charset \|
	+\| `{$COLLATE_TEXT}` \| Text Collation \| For most text (case-sensitive) \|
	+\| `{$COLLATE_SORT}` \| Sort Collation \| For sortable text (case-insensitive) \|
	+\| `{$CHARSET_FULLTEXT}` \| Fulltext Charset \| Specify explicitly for fulltext \|
	+\| `{$COLLATE_FULLTEXT}` \| Fulltext Collate \| Specify explicitly for fulltext \|


	Test your patch. Run `bin/storage upgrade` to test your patch.

	See Also
	========

	- @{class:LiskDAO}
	diff --git a/src/docs/flavor/project_history.diviner b/src/docs/flavor/project_history.diviner
	index bfdbe2682e..c3b5363d50 100644
	--- a/src/docs/flavor/project_history.diviner
	+++ b/src/docs/flavor/project_history.diviner
	@@ -1,60 +1,60 @@
	@title Phabricator Project History
	@group lore

	A riveting tale of adventure. In this document, I refer to worldly and
	sophisticated engineer Evan Priestley as "I", which is only natural as I am he.

	This document is mostly just paragraph after paragraph of self-aggrandizement.

	= In The Beginning =

	I wrote the original version of Differential in one night at a Facebook
	Hackathon in April or May 2007, along with Luke Shepard. I joined the company in
	April and code review was already an established and mostly-mandatory part of
	the culture, but it happened over email and was inefficient and hard to keep
	track of. I remember feeling like I was spending a lot of time waiting for code
	review to happen, which was a major motivator for building the tool.

	The original name of the tool was "Diffcamp". Some time earlier there had been
	an attempt to create a project management tool that was a sort of hybrid between
	Trac and Basecamp called "Traccamp". Since we were writing the code review tool
	at the height of the brief popularity Traccamp enjoyed, we integrated and called
	the new tool Diffcamp even though it had no relation to Basecamp. Traccamp fell
	by the wayside shortly thereafter and was eventually removed.

	However, Diffcamp didn't share its fate. We spent some more time working on it
	and got good enough to win hearts and minds over emailing diffs around and was
	soon the de facto method of code review at Facebook.

	= The Long Bloat =

	For the next two and a half years, Diffcamp grew mostly organically and gained a
	number of features like inline commenting, CLI support and git support (Facebook
	was 100% SVN in early 2007 but 90%+ of Engineers worked primarily in git with
	SVN bridging by 2010). As these patches were contributed pretty much randomly,
	it also gained a lot of performance problems, usability issues, and bugs.

	Through 2007 and 2008 I worked mostly on frontend and support infrastructure;
	among other things, I wrote a static resource management system called Haste. In
	2009 I worked on the Facebook Lite site, where I built the Javelin Javascript
	library and an MVC-flavored framework called Alite.

	But by early 2010, Diffcamp was in pretty bad shape. Two years of having random
	features grafted onto it without real direction had left it slow and difficult
	to use. Internal feedback on the tool was pretty negative, with a lot of
	complaints about performance and stability. The internal XTools team had made
	inroads at fixing these problems in late 2009, but they were stretched thin and
	the tool had become a sprawling landscape of architectural and implementation
	problems.

	= Differential =

	I joined the new Dev Tools team around February 2010 and took over Diffcamp. I
	renamed it to Differential, moved it to a new Alite-based infrastructure with
	Javelin, and started making it somewhat less terrible. I eventually wrote
	-Diffusion and build Herald to replace a very difficult-to-use predecessor. These
	+Diffusion and built Herald to replace a very difficult-to-use predecessor. These
	tools were less negatively received than the older versions. By December 2010 I
	started open sourcing them; Haste became //Celerity// and Alite became
	//Aphront//. I wrote Maniphest to track open issues with the project in January
	-or February and we open sourced Phabricator in late April, shortly after I left
	-Facebook.
	+or February, left Facebook in April, and shortly after, we open sourced
	+Phabricator.
	diff --git a/src/docs/flavor/things_you_should_do_now.diviner b/src/docs/flavor/things_you_should_do_now.diviner
	index b4681bd0ca..0d3b4135ba 100644
	--- a/src/docs/flavor/things_you_should_do_now.diviner
	+++ b/src/docs/flavor/things_you_should_do_now.diviner
	@@ -1,138 +1,138 @@
	@title Things You Should Do Now
	@group sundry

	Describes things you should do now when building software, because the cost to
	do them increases over time and eventually becomes prohibitive or impossible.


	= Overview =

	If you're building a hot new web startup, there are a lot of decisions to make
	about what to focus on. Most things you'll build will take about the same amount
	of time to build regardless of what order you build them in, but there are a few
	technical things which become vastly more expensive to fix later.

	If you don't do these things early in development, they'll become very hard or
	impossible to do later. This is basically a list of things that would have saved
	Facebook huge amounts of time and effort down the road if someone had spent
	a tiny amount of time on them earlier in the development process.

	See also @{article:Things You Should Do Soon} for things that scale less
	drastically over time.


	= Start IDs At a Gigantic Number =

	If you're using integer IDs to identify data or objects, don't start your
	IDs at 1. Start them at a huge number (e.g., 2^33) so that no object ID will
	ever appear in any other role in your application (like a count, a natural
	index, a byte size, a timestamp, etc). This takes about 5 seconds if you do it
	before you launch and rules out a huge class of nasty bugs for all time. It
	becomes incredibly difficult as soon as you have production data.

	The kind of bug that this causes is accidental use of some other value as an ID:

	COUNTEREXAMPLE
	// Load the user's friends, returns a map of friend_id => true
	$friend_ids = user_get_friends($user_id);

	// Get the first 8 friends.
	$first_few_friends = array_slice($friend_ids, 0, 8);

	// Render those friends.
	render_user_friends($user_id, array_keys($first_few_friends));

	Because array_slice() in PHP discards array indices and renumbers them, this
	doesn't render the user's first 8 friends but the users with IDs 0 through 7,
	e.g. Mark Zuckerberg (ID 4) and Dustin Moskovitz (ID 6). If you have IDs in this
	range, sooner or later something that isn't an ID will get treated like an ID
	and the operation will be valid and cause unexpected behavior. This is
	completely avoidable if you start your IDs at a gigantic number.


	= Only Store Valid UTF-8 =

	For the most part, you can ignore UTF-8 and unicode until later. However, there
	is one aspect of unicode you should address now: store only valid UTF-8 strings.

	Assuming you're storing data internally as UTF-8 (this is almost certainly the
	right choice and definitely the right choice if you have no idea how unicode
	works), you just need to sanitize all the data coming into your application and
	make sure it's valid UTF-8.

	If your application emits invalid UTF-8, other systems (like browsers) will
	break in unexpected and interesting ways. You will eventually be forced to
	ensure you emit only valid UTF-8 to avoid these problems. If you haven't
	sanitized your data, you'll basically have two options:

	- do a huge migration on literally all of your data to sanitize it; or
	- forever sanitize all data on its way out on the read pathways.

	As of 2011 Facebook is in the second group, and spends several milliseconds of
	CPU time sanitizing every display string on its way to the browser, which
	multiplies out to hundreds of servers worth of CPUs sitting in a datacenter
	paying the price for the invalid UTF-8 in the databases.

	You can likely learn enough about unicode to be confident in an implementation
	which addresses this problem within a few hours. You don't need to learn
	everything, just the basics. Your language probably already has a function which
	does the sanitizing for you.


	= Never Design a Blacklist-Based Security System =

	When you have an alternative, don't design security systems which are default
	permit, blacklist-based, or otherwise attempt to enumerate badness. When
	Facebook launched Platform, it launched with a blacklist-based CSS filter, which
	basically tried to enumerate all the "bad" parts of CSS and filter them out.
	This was a poor design choice and lead to basically infinite security holes for
	all time.

	It is very difficult to enumerate badness in a complex system and badness is
	often a moving target. Instead of trying to do this, design whitelist-based
	security systems where you list allowed things and reject anything you don't
	understand. Assume things are bad until you verify that they're OK.

	It's tempting to design blacklist-based systems because they're easier to write
	and accept more inputs. In the case of the CSS filter, the product goal was for
	users to just be able to use CSS normally and feel like this system was no
	different from systems they were familiar with. A whitelist-based system would
	reject some valid, safe inputs and create product friction.

	But this is a much better world than the alternative, where the blacklist-based
	system fails to reject some dangerous inputs and creates //security holes//. It
	//also// creates product friction because when you fix those holes you break
	existing uses, and that backward-compatibility friction makes it very difficult
	to move the system from a blacklist to a whitelist. So you're basically in
	trouble no matter what you do, and have a bunch of security holes you need to
	unbreak immediately, so you won't even have time to feel sorry for yourself.

	Designing blacklist-based security is one of the worst now-vs-future tradeoffs
	you can make. See also "The Six Dumbest Ideas in Computer Security":

	http://www.ranum.com/security/computer_security/


	= Fail Very Loudly when SQL Syntax Errors Occur in Production =

	This doesn't apply if you aren't using SQL, but if you are: detect when a query
	fails because of a syntax error (in MySQL, it is error 1064). If the failure
	happened in production, fail in the loudest way possible. (I implemented this in
	2008 at Facebook and had it just email me and a few other people directly. The
	system was eventually refined.)

	This basically creates a high-signal stream that tells you where you have SQL
	injection holes in your application. It will have some false positives and could
	theoretically have false negatives, but at Facebook it was pretty high signal
	considering how important the signal is.

	Of course, the real solution here is to not have SQL injection holes in your
	application, ever. As far as I'm aware, this system correctly detected the one
	SQL injection hole we had from mid-2008 until I left in 2011, which was in a
	hackathon project on an underisolated semi-production tier and didn't use the
	query escaping system the rest of the application does.

	Hopefully, whatever language you're writing in has good query libraries that
	can handle escaping for you. If so, use them. If you're using PHP and don't have
	-a solution in place yet, the Phabricator implementation of qsprintf() is similar
	-to Facebook's system and was successful there.
	+a solution in place yet, the Phabricator implementation of `qsprintf()` is
	+similar to Facebook's system and was successful there.

File Metadata

Mime Type: text/x-diff
Expires: Fri, Feb 7, 7:43 AM (1 d, 14 h)
Storage Engine: blob
Storage Format: Raw Data
Storage Handle: 34077
Default Alt Text: (20 KB)

No OneTemporaryActions

View Options

File Metadata

Event Timeline

No OneTemporary
Actions