Bug 2001136
Summary: | when transfer image fails due to filesystem full the backup is not finalized | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Gianluca Cecchi <gianluca.cecchi> |
Component: | BLL.Storage | Assignee: | Eyal Shenitzky <eshenitz> |
Status: | CLOSED NOTABUG | QA Contact: | Avihai <aefrat> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4.7.7 | CC: | bugs, nsoffer, sgordon, tnisan |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-09-09 12:34:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gianluca Cecchi
2021-09-03 21:57:09 UTC
If I try to stop the backup I get: [g.cecchi@ovmgr1 ~]$ /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py -c ovmgr1 stop dc386237-1e98-40c8-9d3d-45658163d1e2 68f83141-9d03-4cb0-84d4-e71fdd8753bb [ 0.0 ] Finalizing backup '68f83141-9d03-4cb0-84d4-e71fdd8753bb' Traceback (most recent call last): File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 493, in <module> main() File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 176, in main args.command(args) File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 262, in cmd_stop stop_backup(connection, args.backup_uuid, args) File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 345, in stop_backup backup_service.finalize() File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 33869, in finalize return self._internal_action(action, 'finalize', None, headers, query, wait) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 299, in _internal_action return future.wait() if wait else future File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait return self._code(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 296, in callback self._check_fault(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault self._raise_error(response, body) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error raise error ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot stop VM backup. The VM backup is not in READY phase, backup phase is FINALIZING. Please try again when the backup is in READY phase.]". HTTP response code is 409. [g.cecchi@ovmgr1 ~]$ The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. Looking in engine log: $ grep 68f83141-9d03-4cb0-84d4-e71fdd8753bb engine.log ... 1. Backup started 2021-09-03 15:31:11,551+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-50) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] EVENT_ID: VM_BACKUP_STARTED(10,790), Backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb for VM c8server started (User: tekka@mydomain@mydomain). 2. Creating scratch disk for disk c8_data_c8server1 2021-09-03 15:31:12,550+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51', ignoreFailoverLimit='false', storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570', imageGroupId='a6ce101a-f7ce-4944-93a5-e71f32dd6c12', imageSizeInBytes='21474836480', volumeFormat='COW', newImageId='33aa1bac-4152-492d-8a4a-b6d6c0337fec', imageType='Sparse', newImageDescription='{"DiskAlias":"VM c8server backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk for c8_data_c8server1","DiskDescription":"Backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk"}', imageInitialSizeInBytes='1073741824', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000', shouldAddBitmaps='false'}), log id: 164ff0c7 3. Creating scratch disk for disk c8_bootdisk_c8server1 2021-09-03 15:31:12,880+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51', ignoreFailoverLimit='false', storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570', imageGroupId='c9521211-8e24-46ae-aa2e-6f76503527dc', imageSizeInBytes='21474836480', volumeFormat='COW', newImageId='48244767-c8dc-4821-be21-935207068e69', imageType='Sparse', newImageDescription='{"DiskAlias":"VM c8server backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk for c8_bootdisk_c8server1","DiskDescription":"Backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk"}', imageInitialSizeInBytes='1073741824', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000', shouldAddBitmaps='false'}), log id: 367fe98d We can grep for the scratch disk UUIDs: - a6ce101a-f7ce-4944-93a5-e71f32dd6c12 - c9521211-8e24-46ae-aa2e-6f76503527dc But let's first understand what happens to this backup... 4. Backup was started 2021-09-03 15:31:29,883+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.StartVmBackupVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] START, StartVmBackupVDSCommand(HostName = ov200, VmBackupVDSParameters:{hostId='cc241ec7-64fc-4c93-8cec-9e0e7005a49d', backupId='68f83141-9d03-4cb0-84d4-e71fdd8753bb', requireConsistency='false'}), log id: 154dbdc5 {96b0e701-7595-4f04-8569-fb1c72e6f8e0=nbd:unix:/run/vdsm/backup/68f83141-9d03-4cb0-84d4-e71fdd8753bb:exportname=sdb, 33b0f6fb-a855-465d-a628-5fce9b64496a=nbd:unix:/run/vdsm/backup/68f83141-9d03-4cb0-84d4-e71fdd8753bb:exportname=sda}<domaincheckpoint> <description>checkpoint for backup '68f83141-9d03-4cb0-84d4-e71fdd8753bb'</description> The next step is creating image transfer for downloading the disks. Based on your mail: [ 157.8 ] Image transfer 'ccc386d3-9f9d-4727-832a-56d355d60a95' is ready We can follow the image transfer UUID ccc386d3-9f9d-4727-832a-56d355d60a95: 5. Creating image transfer 2021-09-03 15:33:46,892+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-48) [a79ec359-6da7-4b21-a018-1a9360a2f7d8] Creating ImageTransfer entity for command 'ccc386d3-9f9d-4727-832a-56d355d60a95', proxyEnabled: true 6. Image transfer is ready 2021-09-03 15:33:46,922+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-48) [a79ec359-6da7-4b21-a018-1a9360a2f7d8] Updating image transfer ccc386d3-9f9d-4727-832a-56d355d60a95 (image 33b0f6fb-a855-465d-a628-5fce9b64496a) phase to Transferring The next step is finalizing this transfer, after data was downloaded, or download failed... 7. Image transfer finalized 2021-09-03 15:35:34,141+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-48) [cb1714ef-e36c-4984-aee8-3ca2185be724] Updating image transfer ccc386d3-9f9d-4727-832a-56d355d60a95 (image 33b0f6fb-a855-465d-a628-5fce9b64496a) phase to Finalizing Success 8. Image transfer marked as finished... 2021-09-03 15:35:42,337+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-87) [a79ec359-6da7-4b21-a018-1a9360a2f7d8] Updating image transfer ccc386d3-9f9d-4727-832a-56d355d60a95 (image 33b0f6fb-a855-465d-a628-5fce9b64496a) phase to Finished Success 9. Backup is stopped 99-4fd7-8ab3-e2a0f4b13f06] Running command: StopVmBackupCommand internal: false. Entities affected : ID: dc386237-1e98-40c8-9d3d-45658163d1e2 Type: VMAction group BACKUP_DISK with role type ADMIN, ID: 96b0e701-7595-4f04-8569-fb1c72e6f8e0 Type: DiskAction group BACKUP_DISK with role type ADMIN, ID: 33b0f6fb-a855-465d-a628-5fce9b64496a Type: DiskAction group BACKUP_DISK with role type ADMIN 10. Stopping backup on the host 2021-09-03 15:35:43,364+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.StopVmBackupVDSCommand] (default task-48) [893250d4-f599-4fd7-8ab3-e2a0f4b13f06] FINISH, StopVmBackupVDSCommand, return: , log id: 228ad88b 2021-09-03 15:35:43,374+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-48) [893250d4-f599-4fd7-8ab3-e2a0f4b13f06] EVENT_ID: VM_BACKUP_FINALIZED(10,794), Backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb for VM c8server finalized (User: tekka@mydomain@mydomain). 11. Error in the backup command - not sure why... 2021-09-03 15:35:44,390+02 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (EE-ManagedScheduledExecutorServ ice-engineScheduledThreadPool-Thread-7) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] transaction rolled back 2021-09-03 15:35:44,390+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutor Service-engineScheduledThreadPool-Thread-7) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] Command 'StartVmBackup' id: '68f83141-9d03- 4cb0-84d4-e71fdd8753bb' with children [c98edaa9-e3e9-48bc-a775-39d09f34f17c, 0767fbc6-d966-4d0d-bd96-33da00dd7797] failed when attempting to perform the next operation, marking as 'ACTIVE' Template.java:1050) ... 30 more 12. Errors writing to database - no space left 2021-09-03 15:35:45,422+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] EVENT_ID: VM_BACKUP_FAILED(10,792), Backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb for VM c8server failed (User: tekka@mydomain@mydomain). 2021-09-03 15:35:45,422+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] Failed invoking callback end method 'onFailed' for command '68f83141-9d03-4cb0-84d4-e71fdd8753bb' with exception 'CallableStatementCallback; SQL [{call updatestatusofimagesbyimagegroupid(?, ?)}ERROR: could not extend file "base/16385/1398189_fsm": No space left on device Hint: Check free disk space. Where: SQL statement "UPDATE images SET imageStatus = v_status WHERE image_group_id = v_image_group_id" PL/pgSQL function updatestatusofimagesbyimagegroupid(uuid,integer) line 3 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: could not extend file "base/16385/1398189_fsm": No space left on device Hint: Check free disk space. Where: SQL statement "UPDATE images SET imageStatus = v_status WHERE image_group_id = v_image_group_id" PL/pgSQL function updatestatusofimagesbyimagegroupid(uuid,integer) line 3 at SQL statement', the callback is marked for end method retries This seems to be the root cause for the engine failure - engine cannot write to the database, so it cannot complete handling of the backup command. Back to the scratch disk, if we grep the UUIDs we see: $ grep a6ce101a-f7ce-4944-93a5-e71f32dd6c12 engine.log ... 2021-09-03 15:36:19,625+02 INFO [org.ovirt.engine.core.bll.storage.backup.RemoveScratchDisksCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] Teardown 'a6ce101a-f7ce-4944-93a5-e71f32dd6c12/33aa1bac-4152-492d-8a4a-b6d6c0337fec' on the VM host 'cc241ec7-64fc-4c93-8cec-9e0e7005a49d' 2021-09-03 15:36:20,423+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [4dd3b3c7] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a6ce101a-f7ce-4944-93a5-e71f32dd6c12=DISK]', sharedLocks=''}' 2021-09-03 15:36:20,429+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [4dd3b3c7] Running command: RemoveDiskCommand internal: true. Entities affected : ID: a6ce101a-f7ce-4944-93a5-e71f32dd6c12 Type: DiskAction group DELETE_DISK with role type USER 2021-09-03 15:36:20,483+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [4dd3b3c7] START, DeleteImageGroupVDSCommand( DeleteImageGroupVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51', ignoreFailoverLimit='false', storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570', imageGroupId='a6ce101a-f7ce-4944-93a5-e71f32dd6c12', postZeros='false', discard='false', forceDelete='false'}), log id: 63ed36b8 2021-09-03 15:36:21,041+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [4dd3b3c7] Lock freed to object 'EngineLock:{exclusiveLocks='[a6ce101a-f7ce-4944-93a5-e71f32dd6c12=DISK]', sharedLocks=''}' $ grep c9521211-8e24-46ae-aa2e-6f76503527dc engine.log ... 2021-09-03 15:36:20,084+02 INFO [org.ovirt.engine.core.bll.storage.backup.RemoveScratchDisksCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] Teardown 'c9521211-8e24-46ae-aa2e-6f76503527dc/48244767-c8dc-4821-be21-935207068e69' on the VM host 'cc241ec7-64fc-4c93-8cec-9e0e7005a49d' 2021-09-03 15:36:21,043+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [79f71902] Lock Acquired to object 'EngineLock:{exclusiveLocks='[c9521211-8e24-46ae-aa2e-6f76503527dc=DISK]', sharedLocks=''}' 2021-09-03 15:36:21,048+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [79f71902] Running command: RemoveDiskCommand internal: true. Entities affected : ID: c9521211-8e24-46ae-aa2e-6f76503527dc Type: DiskAction group DELETE_DISK with role type USER 2021-09-03 15:36:21,067+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [79f71902] START, DeleteImageGroupVDSCommand( DeleteImageGroupVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51', ignoreFailoverLimit='false', storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570', imageGroupId='c9521211-8e24-46ae-aa2e-6f76503527dc', postZeros='false', discard='false', forceDelete='false'}), log id: 5670f62c 2021-09-03 15:36:21,518+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [79f71902] Lock freed to object 'EngineLock:{exclusiveLocks='[c9521211-8e24-46ae-aa2e-6f76503527dc=DISK]', sharedLocks=''}' So both scratch disks were removed as expected, and the only issue is the backup stuck in the finalizing state. Because the root cause is no space on the database disk, caused by user error (filling up engine disk by mistake), I don't think we can do much about this. Closing as not a bug, since filling engine disk is a user error. Engine must have working database to make progress. |