There are many scenarios where you have to do planned maintenance on your cluster. These could be software upgrades, upgrading your machines, or applying security patches that need restart. YugabyteDB performs rolling upgrades, where nodes are taken offline one at a time, upgraded, and restarted, with zero downtime for the universe as a whole. Let's see how YugabyteDB is resilient during planned maintenance, continuing without any service interruption. ## Setup Consider a setup where YugabyteDB is deployed in a single region (us-east-1) across 3 zones, with leaders and followers distributed across the 3 zones (a,b,c) with 6 nodes 1-6. {{

}} {{}} The application typically connects to all the nodes in the cluster as shown in the following illustration. {{}} All illustrations adhere to the legend outlined in [Legend for illustrations](../../../contribute/docs/docs-layout#legend-for-illustrations) {{}} ![Single region, 3 zones, 6 nodes](/images/explore/fault-tolerance/node-upgrades-setup.png) ## Upgrading a node When upgrading a node or performing maintenance, the first step is to take it offline. {{

}} {{}} In the following illustration, we have chosen node 4 to be upgraded. ![Upgrade a single node](/images/explore/fault-tolerance/node-upgrades-take-offline.png) ### Leaders move If there are leaders on the node to be upgraded, they must first be moved so that there is no service disruption. Stopping the node automatically triggers a leader election with a hint to choose a new leader outside the zone where the node is located. This is repeated for all the leaders on the node. Note that, even though the followers in this node will soon go offline, writes won't be affected as there are followers located in other zones. In the following illustration, the follower for tablet-4 in node-2 located in zone-a has been elected as the new leader, and the replica of tablet-4 in node-4 has been downgraded to follower. ![Leader movement](/images/explore/fault-tolerance/node-upgrades-leader-move.png) ### Node goes offline After the leaders are moved out of the node, YugabyteDB takes the node offline. Connections that have already been established to the node start timing out (as the default TCP timeout is about 15s). New connections also cannot be established. ![Take node offline](/images/explore/fault-tolerance/node-upgrades-node-offline.png) At this point, you can perform your maintenance, add new software, or upgrade the hardware. There is no service disruption during this period as all the tablets have active leaders. ### Bring the node online After completing the upgrade and the required maintenance, you restart the node. {{

}} {{}} The node is automatically added back into the cluster. The cluster will notice that the leaders and followers are unbalanced across the cluster, and trigger a re-balance and leader election. This ensures that the leaders and followers are evenly distributed. All the nodes in the cluster are fully functional and can start taking in load. Notice in the following illustration that the tablet followers in node-4 are updated with the latest data and are made leaders. ![Back online](/images/explore/fault-tolerance/node-upgrades-back-online.png) During this entire process, there is neither data loss nor service disruption.