Description of problem: Today backup and restore operators assume that the backup state is valid. While this assumption is oftentimes true there is a chance of on corruption on disk during backup or during storage before restore. To mitigate any risk we should use the hash output of `etcdctl snapshot status` and persist it as part of the backup resources. Docs should reflect that this information should be stored separately away from the release (similar to encryption keys). It should then be possible to ensure the hashes match on restore. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. run DR backup and 2. swap out etcd state file with one from another backup using the same name. 3. restore Actual results: restore will happily use any backup as long as the name is as expected. Expected results: validation of backup consistency (etcdctl snapshot status) be run against the snapshot during backup and before restore. The hash from the backup is persisted and validated during restore. Additional info:
For 4.8, we decided not to store the checksum during the backup. That means, there is no checksum to check against during the restore. The current PR only makes sure that the backup database is not corrupted, by running the status check against the database. So, for testing purposes: 1. run DR backup and 2. corrupt the etcd db file (on linux use truncate to truncate the last few blocks of the database file). 3. Attempt to restore 4. The attempt to restore should fail with validation error.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438