Created attachment 1774523 [details] Backup logs Description of problem: VM backup will always fail if user starts preview to older snapshot Version-Release number of selected component (if applicable): Ovirt-4.4.5 vdsm-4.40.50.10-1.el8ev.x86_64 ovirt-imageio-daemon-2.1.1-1.el8ev.x86_64 libvirt-daemon-6.6.0-13.2.module+el8.3.1+10483+85317cf0.x86_64 qemu-kvm-5.1.0-21.module+el8.3.1+10464+8ad18d1a.x86_64 How reproducible: Steps to Reproduce: 1.Make full vm backup 2.Create snapshot snap1 3.Make incremental backup 4.Create snapshot snap2 5.Make incremental backup 6.Preview snapshot snap1 7.Start vm backup Actual results: Vm backup failed on every next run Expected results: Full vm backup started Additional info: []VDSError:{code='null', message='checkpoint inconsistent: missing or broken bitmap 'd62d5072-4ef0-4397-9802-e9d6c91f819e' for disk 'sda''}, log id: 64ac0bd5 2021-04-22 12:55:00,858+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-231) [ed1d3e40-cd6c-4c2c-a258-ff322f31e798] EVENT_ID: VM_INCREMENTAL_BACKUP_FAILED_FULL_VM_BACKUP_NEEDED(10,791), Incremental backup 29c5900a-4a23-40b5-b7f3-c96b714f687c for VM Cent8-t1_restored2 failed. Full backup is now needed (User: admin@internal-authz). 2021-04-22 12:55:00,874+02 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-231) [ed1d3e40-cd6c-4c2c-a258-ff322f31e798] Trying to release a shared lock for key: '16a6c58b-4fbf-4862-8fd7-e71b4e70d5e9VM' , but lock does not exist 2021-04-22 12:55:00,874+02 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-231) [ed1d3e40-cd6c-4c2c-a258-ff322f31e798] Trying to release exclusive lock which does not exist, lock key: 'e432953d-3ba9-4b71-bf6a-51390d281b83DISK' 2021-04-22 12:55:00,874+02 INFO
Hi Yury, Can you please elaborate on what is the specific flow. It seems to me that you are trying to start an incremental backup while the VM is in 'preview' that is based on a checkpoint that was take only in a later snapshot and basically doesn't exists. For example: Steps to Reproduce: 1.Make full VM backup - backup_1 2.Create snapshot snap1 3.Make an incremental backup - backup_2 (based on backup_1) 4.Create snapshot snap2 5.Make an incremental backup - backup_3 (based on backup_2) 6.Preview snapshot snap1 -> contains only backup_1 checkpoint 7.Start VM incremental backup - backup_4 (based on backup_3) ==> Failed, which is expected since the checkpoint of backup_3 doesn't exists when previewing snap1, only backup_1 checkpoint.
Hello, Eyal. > It seems to me that you are trying to start an incremental backup while the VM is in 'preview' that is based on a checkpoint that was take > only in a later snapshot and basically doesn't exists. You're right. But, backup application can't control user actions, and it doesn't know anything about what snapshot does user select. I don't expect error in this way. if vm can't be backed incremnetal, let's do a full backup
(In reply to Yury.Panchenko from comment #2) > I don't expect error in this way. if vm can't be backed incremnetal, let's > do a full backup We cannot determine for the user / backup application the type of the backup that will be taken in case of a failure. In this case, you should identify the specific error (VM_INCREMENTAL_BACKUP_FAILED_FULL_VM_BACKUP_NEEDED(10,791)) and retry to take a full backup for the VM. There is one problem that I am thinking of, In case of a VM in preview and a failure to redefine a checkpoint as you mentioned, the entire checkpoints chain will be removed. I am not sure if we need to avoid it or not in case of a VM in preview mode, failure can be due to a real issue also. Nir, what do you think?
Hello, Eyal. In this case, i've had another problem with error output. That event: VM_INCREMENTAL_BACKUP_FAILED_FULL_VM_BACKUP_NEEDED(10,791), doesn't return in the web response. In our app it looks like [2021-05-05] [16:11:13.861] [162498] [Info ] [RestCli] Response data (400 Bad Request): [2021-05-05] [16:11:13.862] [162498] [Debug] [JsonReporter] Event: 2, Text: Creating incremental backup failed: Reason: 'Operation Failed', Detail: '[]' Can you check that please? > We cannot determine for the user / backup application the type of the backup that will be taken in case of a failure. What option does user have? He want's to create backup. Of course, he will restart backup operation and make full. I don't see any other option.
(In reply to Yury.Panchenko from comment #4) > Hello, Eyal. > In this case, i've had another problem with error output. > That event: VM_INCREMENTAL_BACKUP_FAILED_FULL_VM_BACKUP_NEEDED(10,791), > doesn't return in the web response. > In our app it looks like > [2021-05-05] [16:11:13.861] [162498] [Info ] [RestCli] Response data (400 > Bad Request): > [2021-05-05] [16:11:13.862] [162498] [Debug] [JsonReporter] Event: 2, Text: > Creating incremental backup failed: Reason: 'Operation Failed', Detail: '[]' > Can you check that please? The error will not return in the response, it should be fetched from the engine events. The events will include the created backup ID and can be filtered according to it. You can see and en example for it using the Python SDK - https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.py#L437 > > > > We cannot determine for the user / backup application the type of the backup that will be taken in case of a failure. > What option does user have? He want's to create backup. > Of course, he will restart backup operation and make full. I don't see any > other option. Right, but the user needs to initiate it, we cannot add those assumptions.
(In reply to Eyal Shenitzky from comment #3) Preview is a very special case. Basically we have 2 backup systems managing the same vm at the same time. Assuming we have vm with these snapshots: A < B < C During preview the chain may be: A < D < B < C Creating incremental backup at this point is likey to create corrupted backup. Creating full backup in this state is correct, but this backup cannot be used if the user undo the preview operation and the VM go back to the original state: A < B < C Preview is a temporary state. It can end in 2 ways: 1. User commit the operation - we drop the rest of the chain: A < D In this case the backup application must create new full backup. 2. User undo the operation - we drop the preview volume A < B < C In this case the backup application can continue to do incremental backup. So I think the right way is to block backup in preview state. Full or incremental backup will fail with special error (VM_IN_PREVIEW). This can be implemented in 4.4.7. Until we implement this the backup application can check using the API if the vm is in preview mode.
Hello Nir and Eyal, > So I think the right way is to block backup in preview state. Full or > incremental backup will fail with special error (VM_IN_PREVIEW). > This can be implemented in 4.4.7. Until we implement this the backup > application can check using the API if the vm is in preview mode. I can't agree with that. Backup application can't resolve that situation. So, it adds abbility to lost critical data when vm is in preview mode. App can do a full backup in preview mode. if user reverts snapshot - ok, then app will do new full backup. If user commits snapshot the app just continue with incremental backup. I've got an additional subcase - So, I repeated the orignal case but in the end, vm left in poweroff state. After that, i start next incremental backup The backup created and start NBD server failed 2021-05-11 11:45:21,360+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-1442) [22ea70bb-b1e6-4896-8044-7c74199e2eac] Running command: StartVmBackupCommand internal: false. Entities affected : ID: 323c918c-ceb5-47f2-aaea-26a1249fd2bd Type: VMAction group BACKUP_DISK with role type ADMIN, ID: 8113d4f9-7792-4e02-bfc9-0fccfb1cd1f3 Type: DiskAction group BACKUP_DISK with role type ADMIN 2021-05-11 11:45:40,022+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1443) [92c2ba48-72c0-4e6a-afdb-7e98d10f490e] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM pan-rhv44rel1 command StartNbdServerVDS failed: Bitmap does not exist: "{'bitmap': '1ccb732b-a7fe-4c97-afe6-67f4f33e5cc8'}"
Created attachment 1781972 [details] Logs for the subcase
(In reply to Yury.Panchenko from comment #7) > Hello Nir and Eyal, > > > So I think the right way is to block backup in preview state. Full or > > incremental backup will fail with special error (VM_IN_PREVIEW). > > This can be implemented in 4.4.7. Until we implement this the backup > > application can check using the API if the vm is in preview mode. > > I can't agree with that. > Backup application can't resolve that situation. > So, it adds abbility to lost critical data when vm is in preview mode. VM in preview mode is like a *new* vm being created. No critical data is added in preview mode. When a user commit, the vm becomes a normal vm and at this point you can backup the vm. > App can do a full backup in preview mode. if user reverts snapshot - ok, > then app will do new full backup. > If user commits snapshot the app just continue with incremental backup. I don't think it is useful to try to backup vm in temporary state, and the way preview mode is implemented will make this hard. > I've got an additional subcase - > So, I repeated the orignal case but in the end, vm left in poweroff state. > After that, i start next incremental backup > The backup created and start NBD server failed > 2021-05-11 11:45:21,360+02 INFO > [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default > task-1442) [22ea70bb-b1e6-4896-8044-7c74199e2eac] Running command: > StartVmBackupCommand internal: false. Entities affected : ID: > 323c918c-ceb5-47f2-aaea-26a1249fd2bd Type: VMAction group BACKUP_DISK with > role type ADMIN, ID: 8113d4f9-7792-4e02-bfc9-0fccfb1cd1f3 Type: DiskAction > group BACKUP_DISK with role type ADMIN > > 2021-05-11 11:45:40,022+02 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (default task-1443) [92c2ba48-72c0-4e6a-afdb-7e98d10f490e] EVENT_ID: > VDS_BROKER_COMMAND_FAILURE(10,802), VDSM pan-rhv44rel1 command > StartNbdServerVDS failed: Bitmap does not exist: "{'bitmap': > '1ccb732b-a7fe-4c97-afe6-67f4f33e5cc8'}" I'm not sure what is the case, you need to provide detailed instructions so we can reproduce your flow. But the error means that the requested bitmap does not exist. This is likely result of starting vm in preview mode. For example: 1. create snapshot 1 A < B 2. full backup (creating checkpoint 1) A < B (checkpoint 1) 3. create snapshot 2 (checkpoint 1 continue to track new snapshot) A <- B (checkpoint 1) <- C (checkpoint 1, checkpoint 2) 4. incremental backup (creating checkpoint 2, exist only in image C) A <- B (checkpoint 1) <- C (checkpoint 1, checkpoint 2) 5. user preview reverting to snapshot 1 A <- D We still have this chain on storage: B (checkpoint 1) <- C (checkpoint 1, checkpoint 2) But it is not used by the vm. Note that D does cannot have checkpoint 1 or 2. At this point incremental backup is not possible (cold or live). If you want to treat preview state as new vm, you will have to do full backup at this point. If a user undo the preview mode, you will have to drop the full backup and continue to do incremental backup using the old vm state.
Hello, Nir. > VM in preview mode is like a *new* vm being created. No critical data > is added in preview mode. When a user commit, the vm becomes a normal > vm and at this point you can backup the vm. So, ok. I agree with you. If vm in temporal stage we can skip backup.
Verified. Steps to Reproduce: 1.Make full VM backup 2.Create snapshot snap1 3.Make incremental backup 4.Create snapshot snap2 5.Make incremental backup 6.Preview snapshot snap1 7.Start VM backup(Full/incremental) -> Action should be blocked "[Cannot backup VM. VM is previewing a Snapshot.] 8)commit 9)Incremental backup to see that the checkpoint was removed: 2021-06-30 19:06:11,715+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-72) [] Operation Failed: [Cannot backup VM. Checkpoint ID 9591d7d7-0c3b-43cc-819b-350501352180 doesn't exist.] 10)Start Full backup of the VM - > Should succeed. 2021-06-30 19:44:28,817+03 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-77) [6cfed041-b7ba-4524-972a-855ed757d6a7] Running command: StartVmBackupCommand internal: false. Entities affected : ID: 7943cec4-0af0-4912-9d86-a1714742018a Type: VMAction group BACKUP_DISK with role type ADMIN, ID: fe0d84bb-f605-4bdd-8694-18b2ecb27728 Type: DiskAction group BACKUP_DISK with role type ADMIN 2021-06-30 19:44:28,837+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-77) [6cfed041-b7ba-4524-972a-855ed757d6a7] EVENT_ID: VM_BACKUP_STARTED(10,790), Backup b2ddfbd2-90f1-4dbc-ada4-700cf0946a82 for VM New_VM started <creation_date>2021-06-30T19:44:28.817+03:00</creation_date> <phase>ready</phase> <to_checkpoint_id>9eac6c09-7a81-411a-91b2-c94c6b5a4d22</to_checkpoint_id>
This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days