LEAVE CLUSTER
Note: for the most recent updates on the topic of cluster membership and node lifecycle, please also check:
Also, see related:
Table of Contents
- Putting a node in maintenance
- Clearing maintenance state
- Removing a node from a cluster
- Interrupt node removal
- Checking removal status
Sometimes a node has to be removed from a cluster - either temporarily (e.g., to perform maintenance tasks) or permanently. It is important to do the “removal” gracefully, via provided AIS APIs. In particular, if the node is a storage target, it contains user data that must be rebalanced to remaining clustered nodes.
Two kinds of node removal are supported:
- temporary removal, e.g. for node maintenance. In this case, the node remains in the cluster but it stops responding to client requests. Temporary removal can be done in two ways:
start-maintenance
keeps the node running, andshutdown
stops the node; - permanent removal, e.g. node decommission.
decommission
disables a node and starts moving all its objects to other targets. When rebalancing finishes, the primary proxy automatically removes the node from the cluster. On leaving the cluster, the node erases its AIS metadata and optionally deletes all user data.
Putting a node in maintenance
To temporarily take a node out of the cluster, put a node in maintenance (CLI). Nodes undergoing maintenance remain in the cluster map, as shown in the above diagram.
In general, when storage targets leave (or join) the cluster, the current primary (leader) proxy transactionally creates the next updated version of the cluster map and synchronizes the new map across the entire cluster so that each and every node gets the new version. This results in each AIS target starting to traverse its locally stored content, recomputing object locations, and sending at least some of the objects to their respective new locations. Object migration is then carried out via an intra-cluster optimized communication mechanism and over a separate physical or logical network, if provisioned.
For more details about the rebalancing process, see REBALANCE.
$ ais cluster add-remove-nodes start-maintenance 59262t8087
Node "59262t8087" is in maintenance mode
Started rebalance "g1", use 'ais show job xaction g1' to monitor the progress
Alternatively, you can shut down the node to stop all AIS services on the node after putting it in maintenance mode:
$ ais cluster add-remove-nodes shutdown 59262t8087
Node "59262t8087" is in maintenance mode
Started rebalance "g1", use 'ais show job xaction g1' to monitor the progress
If the node is a target, after a quick preparation, the cluster will rebalance. When the rebalance finishes, it is safe to turn the node off.
If the node does not contain any important data, you can skip rebalancing with --no-rebalance
, so the node is safe to switch off shortly after putting it in maintenance:
$ ais cluster add-remove-nodes start-maintenance 59262t8087 --no-rebalance
Node "59262t8087" is in maintenance
Clearing maintenance state
Once a node is put in maintenance mode, the cluster keeps it in this state until you notify the cluster that the node is ready to use. If the node was shut down, you have to restart or power it on and wait until the node registers at the primary proxy beforehand. After getting the notification, the cluster clears the maintenance state and starts rebalance:
$ ais cluster add-remove-nodes stop-maintenance 59262t8087
Node "59262t8087" maintenance stopped
Started rebalance "g3", use 'ais show job xaction g3' to monitor the progress
To skip automatic rebalance, provide flag --no-rebalance
.
It is recommended to keep automatic rebalance running automatically, but these are some cases in which it is safe to skip rebalancing:
- all buckets are empty
- maintenance was started with
--no-rebalance
and no object was added or updated during maintenance - objects can be refetched from remote sources. E.g, all buckets are remote AIS, HTTP or Cloud ones. In this case, targets redownload missing objects. This can cost extra money for Cloud traffic
- you are going to stop maintenance for more than 1 node. So, all nodes except the last one are back to the cluster with the flag
--no-rebalance
, and the last node starts the automatic rebalance
Removing a node from a cluster
To completely remove the node from the cluster, decommission the node (CLI):
$ ais cluster add-remove-nodes decommission 59262t8087
Node "59262t8087" is in maintenance
Started rebalance "g1", use 'ais show job xaction g1' to monitor the progress
When the rebalance finishes, the cluster removes the node automatically from the list. On unregistering, the node erases its AIS metadata. Skipping rebalance runs quick preparations and removes the node from the cluster immediately:
$ ais cluster add-remove-nodes decommission --no-rebalance 59262t8087
Node "59262t8087" removed from the cluster
Note that decommission
cleans up all AIS metadata and stops the node. shutdown
only stops AIS services.
If the node is a target, the node will be shut down after the rebalance has finished. Otherwise, if the node is a proxy, the node will shut down immediately.
$ ais cluster add-remove-nodes shutdown 59262t8087
Node "59262t8087" is being shut down
Started rebalance "g1", use 'ais show job xaction g1' to monitor the progress
Note that you can interrupt a node removal with rebalance skipped. If you remove a node by mistake, you have to join it manually with ais cluster add-remove-nodes join
command.
Interrupt node removal
While rebalance is running, you can interrupt the node removal to get the node back to the cluster.
Rebalance starts automatically when a node is a target and flag --no-rebalance
is not set after the node is registered at the cluster and the cluster clears the node’s maintenance state.
$ ais cluster add-remove-nodes stop-maintenance 59262t8087
Node "59262t8087" maintenance stopped
Started rebalance "g3", use 'ais show job xaction g3' to monitor the progress
The node starts accepting all the requests after joining the cluster and after the cluster clears node’s maintenance state. You do not have to wait until the rebalance is done.
Checking removal status
Putting a node in maintenance does NOT automatically power off the node.
AIS only runs a rebalance when a node is in maintenance mode. Manually check the cluster health (show cluster target
) to ensure that it is safe to turn the node off. Besides checking xaction progress, the removal status can be monitored with ais cluster status
.
In the example below it is safe to turn off the node (the column REBALANCE
states that the rebalance has already finished and the node is labeled maintenance
):
$ ais show cluster target
TARGET MEM USED % MEM AVAIL CAP USED % CAP AVAIL CPU USED % REBALANCE UPTIME STATUS
59262t8087 0.13% 31.28GiB 16% 2.435TiB 0.00% finished 31m maintenance
93683t8084 0.13% 31.28GiB 16% 2.435TiB 0.12% finished 31m online
For decommissioning nodes, the status looks this while the rebalance is running:
$ ais show cluster target
TARGET MEM USED % MEM AVAIL CAP USED % CAP AVAIL CPU USED % REBALANCE UPTIME STATUS
59262t8087 0.13% 31.28GiB 16% 2.435TiB 0.00% running 31m decommission
93683t8084 0.13% 31.28GiB 16% 2.435TiB 0.12% running 31m online
On finishing the rebalance, the primary proxy removes the node automatically:
$ ais show cluster target
TARGET MEM USED % MEM AVAIL CAP USED % CAP AVAIL CPU USED % REBALANCE UPTIME
93683t8084 0.13% 31.28GiB 16% 2.435TiB 0.12% running 31m