
|          Product           |  Affected Versions  |  Related Issues   | Fixed In |
| :------------------------- | :------------------ | :---------------- | :------- |
| {{<product "ybdb, yba">}}  | {{<release "2.18, 2.20">}} | {{<issue 21491>}} | {{<release "2.20.2.1, 2.18.7.0">}}      |

## Description

Upgrading from prior versions (other than v2.14, v2.16) to v2.18 or v2.20 fails due to a race condition during post upgrade. While the yb-tservers themselves can be healthy and their raft configurations can remain intact, they will fail to heartbeat to the yb-master.
This is a race condition (that can happen even while the probability is low) that requires YugabyteDB Anywhere to execute a post-upgrade action of un-blacklisting yb-tservers at the exact same time as yb-master executing a background task of generating `universe_uuid` field. This issue is less likely in v2.20 due to the post upgrade actions taking much longer in v2.20 compared to v2.18, significantly reducing the probability of hitting this issue.

v2.14 and v2.16 releases are not impacted by this issue. This is because the flag `master_enable_universe_uuid_heartbeat_check` is not auto-promoted and so the functionality is OFF by default until you explicitly turn it ON.

## Mitigation

Set the `master_enable_universe_uuid_heartbeat_check` flag on yb-master to false. It can be performed as a non-rolling, non-restart YugabyteDB Anywhere upgrade after the database upgrade is complete.
After this flag change is applied, upgrade to a release with the fix and to re-enable the flag.
Re-enabling the flag requires running a [yb-ts-cli](../../../admin/yb-ts-cli/) command to clear the `universe_uuid` on all nodes. After the `universe_uuid` is cleared, the flag can be re-enabled on yb-master.

## Details

The `universe_uuid` field was added to `ClusterConfig` as part of [#17904](https://github.com/yugabyte/yugabyte-db/commit/fb98e56488f70ce4940861127f5ce724fb1acc14). This is essentially an identity for the universe which all the yb-tservers inherit from the yb-master as part of the heartbeat. If set, this value is not meant to change on either the yb-tservers or yb-masters and provides a way for the yb-master to reject any heartbeats from a different universe.

For universes upgrading from an older release to one having the preceding commit, the catalog manager generates a new `universe_uuid` and propagates that to the yb-tserver. However, before persisting the `universe_uuid` in `cluster_config`, the version number is not being incremented.
As a result of this, the following race is possible:

1. Cluster gets upgraded to a release with commit [fb98e56](https://github.com/yugabyte/yugabyte-db/commit/fb98e56488f70ce4940861127f5ce724fb1acc14) and the feature `master_enable_universe_uuid_heartbeat_check` is enabled due to promotion of flags.
1. YugabyteDB Anywhere reads the cluster configuration (ClusterConfig) at version 'X'.
1. Catalog manager background thread runs and generates a new `universe_uuid`, persists it in ClusterConfig and propagates it to all the yb-tservers.
1. YugabyteDB Anywhere from Step 2 updates the ClusterConfig using `ChangeMasterClusterConfigRequestPB` with version 'X'. (For un-blacklisting nodes)
1. Update from Step 4 succeeds because ClusterConfig version 'X' on disk matches the one in the request 'X', effectively overwriting the `universe_uuid` generated in Step 3.
1. Catalog manager background thread runs again and because the `universe_uuid` is empty, it generates a new one again.

After the new `universe_uuid` is generated on the catalog manager in Step 6, yb-master essentially starts rejecting heartbeats from all the yb-tservers which keep reporting the previous `universe_uuid` generated by the catalog manager in Step 3.
