Page MenuHomestyx hydra

No OneTemporary

diff --git a/src/docs/user/cluster/cluster_repositories.diviner b/src/docs/user/cluster/cluster_repositories.diviner
index c5179666a7..eb9a4f4ede 100644
--- a/src/docs/user/cluster/cluster_repositories.diviner
+++ b/src/docs/user/cluster/cluster_repositories.diviner
@@ -1,112 +1,198 @@
@title Cluster: Repositories
@group intro
Configuring Phabricator to use multiple repository hosts.
Overview
========
WARNING: This feature is a very early prototype; the features this document
describes are mostly speculative fantasy.
If you use Git or Mercurial, you can deploy Phabricator with multiple
repository hosts, configured so that each host is readable and writable. The
advantages of doing this are:
- you can completely survive the loss of repository hosts;
- reads and writes can scale across multiple machines; and
- read and write performance across multiple geographic regions may improve.
This configuration is complex, and many installs do not need to pursue it.
-This configuration is not currently supported with Subversion.
+This configuration is not currently supported with Subversion or Mercurial.
Repository Hosts
================
Repository hosts must run a complete, fully configured copy of Phabricator,
-including a webserver. If you make repositories available over SSH, they must
-also run a properly configured `sshd`.
+including a webserver. They must also run a properly configured `sshd`.
Generally, these hosts will run the same set of services and configuration that
web hosts run. If you prefer, you can overlay these services and put web and
-repository services on the same hosts.
+repository services on the same hosts. See @{article:Clustering Introduction}
+for some guidance on overlaying services.
When a user requests information about a repository that can only be satisfied
by examining a repository working copy, the webserver receiving the request
will make an HTTP service call to a repository server which hosts the
repository to retrieve the data it needs. It will use the result of this query
to respond to the user.
How Reads and Writes Work
=========================
Phabricator repository replicas are multi-master: every node is readable and
writable, and a cluster of nodes can (almost always) survive the loss of any
arbitrary subset of nodes so long as at least one node is still alive.
Phabricator maintains an internal version for each repository, and increments
it when the repository is mutated.
Before responding to a read, replicas make sure their version of the repository
is up to date (no node in the cluster has a newer version of the repository).
If it isn't, they block the read until they can complete a fetch.
Before responding to a write, replicas obtain a global lock, perform the same
version check and fetch if necessary, then allow the write to continue.
+Additionally, repositories passively check other nodes for updates and
+replicate changes in the background. After you push a change to a repositroy,
+it will usually spread passively to all other repository nodes within a few
+minutes.
+
+Even if passive replication is slow, the active replication makes acknowledged
+changes sequential to all observers: after a write is acknowledged, all
+subsequent reads are guaranteed to see it. The system does not permit stale
+reads, and you do not need to wait for a replication delay to see a consistent
+view of the repository no matter which node you ask.
+
HTTP vs HTTPS
=============
Intracluster requests (from the daemons to repository servers, or from
webservers to repository servers) are permitted to use HTTP, even if you have
set `security.require-https` in your configuration.
It is common to terminate SSL at a load balancer and use plain HTTP beyond
that, and the `security.require-https` feature is primarily focused on making
client browser behavior more convenient for users, so it does not apply to
intracluster traffic.
Using HTTP within the cluster leaves you vulnerable to attackers who can
observe traffic within a datacenter, or observe traffic between datacenters.
This is normally very difficult, but within reach for state-level adversaries
like the NSA.
If you are concerned about these attackers, you can terminate HTTPS on
repository hosts and bind to them with the "https" protocol. Just be aware that
the `security.require-https` setting won't prevent you from making
configuration mistakes, as it doesn't cover intracluster traffic.
Other mitigations are possible, but securing a network against the NSA and
similar agents of other rogue nations is beyond the scope of this document.
+Monitoring Replication
+======================
+
+You can review the current status of a repository on cluster nodes in
+{nav Diffusion > (Repository) > Manage Repository > Cluster Configuration}.
+
+This screen shows all the configured devices which are hosting the repository
+and the available version.
+
+**Version**: When a repository is mutated by a push, Phabricator increases
+an internal version number for the repository. This column shows which version
+is on disk on the corresponding node.
+
+After a change is pushed, the node which received the change will have a larger
+version number than the other nodes. The change should be passively replicated
+to the remaining nodes after a brief period of time, although this can take
+a while if the change was large or the network connection between nodes is
+slow or unreliable.
+
+You can click the version number to see the corresponding push logs for that
+change. The logs contain details about what was changed, and can help you
+identify if replication is slow because a change is large or for some other
+reason.
+
+**Writing**: This shows that the node is currently holding a write lock. This
+normally means that it is actively receiving a push, but can also mean that
+there was a write interruption. See "Write Interruptions" below for details.
+
+
+Write Interruptions
+===================
+
+A repository cluster can be put into an inconsistent state by an interruption
+in a brief window immediately after a write.
+
+Phabricator can not commit changes to a working copy (stored on disk) and to
+the global state (stored in a database) atomically, so there is a narrow window
+between committing these two different states when some tragedy (like a
+lightning strike) can befall a server, leaving the global and local views of
+the repository state divergent.
+
+In these cases, Phabricator fails into a "frozen" state where further writes
+are not permitted until the failure is investigated and resolved.
+
+TODO: Complete the support tooling and provide recovery instructions.
+
+
+Loss of Leaders
+===============
+
+A more straightforward failure condition is the loss of all servers in a
+cluster which have the most up-to-date copy of a repository. This looks like
+this:
+
+ - There is a cluster setup with two nodes, X and Y.
+ - A new change is pushed to server X.
+ - Before the change can propagate to server Y, lightning strikes server X
+ and destroys it.
+
+Here, all of the "leader" nodes with the most up-to-date copy of the repository
+have been lost. Phabricator will refuse to serve this repository because it
+can not serve it consistently, and can not accept writes without data loss.
+
+The most straightforward way to resolve this issue is to restore any leader to
+service. The change will be able to replicate to other nodes once a leader
+comes back online.
+
+If you are unable to restore a leader or unsure that you can restore one
+quickly, you can use the monitoring console to review which changes are
+present on the leaders but not present on the followers by examining the
+push logs.
+
+TODO: Complete the support tooling and provide recovery instructions.
+
+
Backups
======
Even if you configure clustering, you should still consider retaining separate
backup snapshots. Replicas protect you from data loss if you lose a host, but
they do not let you rewind time to recover from data mutation mistakes.
If something issues a `--force` push that destroys branch heads, the mutation
will propagate to the replicas.
You may be able to manually restore the branches by using tools like the
Phabricator push log or the Git reflog so it is less important to retain
repository snapshots than database snapshots, but it is still possible for
data to be lost permanently, especially if you don't notice the problem for
some time.
Retaining separate backup snapshots will improve your ability to recover more
data more easily in a wider range of disaster situations.
Next Steps
==========
Continue by:
- returning to @{article:Clustering Introduction}.

File Metadata

Mime Type
text/x-diff
Expires
Mon, Mar 16, 11:10 PM (1 d, 4 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
963544
Default Alt Text
(8 KB)

Event Timeline