
|          Product           |  Affected Versions  |  Related Issues   | Fixed In |
| :------------------------- | :------------------ | :---------------- | :------- |
| {{<product "ybdb,yba">}} | {{<release "2.20">}}, {{<release "2024.1">}}, {{<release "2024.2">}} | {{<issue 26910>}} | {{<release "2.20.11.0">}}, {{<release "2024.2.4.0">}} |

## Description

In YugabyteDB, a snapshot is a consistent state of data taken across all nodes in a cluster at a point in time. The final phase of a snapshot operation involves a few disk writes. If a ulimit is incorrectly configured, the YB-TServer may crash during these writes, which can result in loss of data on that node.

## Mitigation

To mitigate the issue, perform _one_ of the following:

- Ensure ulimits are correctly configured. For more details, see [Set ulimits](../../../deploy/manual-deployment/system-config/#set-ulimits).
- Upgrade to a version that includes the fix: {{<release "2.20.11.0">}} or {{<release "2024.2.4.0">}}.

Note that until the correct ulimit configuration is set, snapshot operations remain at risk. Snapshots are taken during backups and when Point-in-Time Recovery (PITR) snapshot schedules are enabled. This includes setting up xCluster, which triggers a backup, and transactional xCluster, which enables PITR snapshot schedules on both clusters.

## Details

A snapshot operation is performed on all tablets in the database, during operations like backups, and PITR snapshot schedules.

As part of applying [SNAPSHOT_OP](../../../manage/backup-restore/snapshot-ysql/) (the internal snapshot operation applied on each tablet), YugabyteDB updates the flushed frontier in the RocksDB manifest file via `Tablet::ModifyFlushedFrontie`. During this process, while previous write operations have been applied to the RocksDB instance, they might not have been flushed to the SST file yet. If ulimits are incorrectly set, the SST flush can fail, resulting in the the YB-TServer process crashing. Upon restart, the process reads the manifest data and incorrectly assumes the flush has already completed successfully, instead of retrying the operation. This can result in loss of data on that node.

In the fixed versions, YugabyteDB performs a synchronous flush of RocksDB before calling `Tablet::ModifyFlushedFrontie` to update the flushed frontier in the manifest file.
