Description of problem: Version-Release number of selected component (if applicable): oVirt Engine ver. 4.4.1, 4.4.2 Remove managed block disks from Cinderlib Storage domain (after making snapshots of VM) fails with EVENT_ID: USER_FINISHED_FAILED_REMOVE_DISK(2,015) 2020-09-23 08:59:29,144+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (default task-9) [442dbd9c-3e72-4bc1-9964-675c7c836da4] Lock Acquired to object 'EngineLock:{exclusiveLocks='[63759b73-5198-415a-b81a-a5f881fdea02=DISK]', sharedLocks=''}' 2020-09-23 08:59:29,250+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engine-Thread-15297) [442dbd9c-3e72-4bc1-9964-675c7c836da4] Running command: RemoveDiskCommand internal: false. Entities affected : ID: 63759b73-5198-415a-b81a-a5f881fdea02 Type: DiskAction group DELETE_DISK with role type USER 2020-09-23 08:59:29,314+02 INFO [org.ovirt.engine.core.bll.storage.disk.managedblock.RemoveManagedBlockStorageDiskCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [442dbd9c-3e72-4bc1-9964-675c7c836da4] Running command: RemoveManagedBlockStorageDiskCommand internal: true. 2020-09-23 08:59:43,709+02 ERROR [org.ovirt.engine.core.common.utils.cinderlib.CinderlibExecutor] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [442dbd9c-3e72-4bc1-9964-675c7c836da4] cinderlib execution failed: 2020-09-23 08:59:43,759+02 ERROR [org.ovirt.engine.core.bll.storage.disk.managedblock.RemoveManagedBlockStorageDiskCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [442dbd9c-3e72-4bc1-9964-675c7c836da4] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.disk.managedblock.RemoveManagedBlockStorageDiskCommand'. 2020-09-23 08:59:43,870+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-15297) [442dbd9c-3e72-4bc1-9964-675c7c836da4] EVENT_ID: USER_FINISHED_FAILED_REMOVE_DISK(2,015), Failed to remove disk SimpleBackup_Disk1 from storage domain cinderlib1 (User: admin@internal-authz). 2020-09-23 08:59:43,890+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engine-Thread-15297) [442dbd9c-3e72-4bc1-9964-675c7c836da4] Lock freed to object 'EngineLock:{exclusiveLocks='[63759b73-5198-415a-b81a-a5f881fdea02=DISK]', sharedLocks=''}' How reproducible: Steps to Reproduce: 1. Make snapshot of VM with disk on Managed Block Storage domain 2. Remove snapshot of VM 3. Start removing disk from Managed Block Storage domain Actual results: Failed to remove disk SimpleBackup_Disk1 from storage domain cinderlib1 (User: admin@internal-authz). Expected results: Disk "Disk name" was successfully removed from domain cinderlib1 (User admin@internal-authz). Additional info:
Created attachment 1716237 [details] ovirt engine log
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
Thanks for the report. Can you confirm that this happened on VM that was previously running? It looks like the issue occurs when there was a failure in detaching, so the volume will still have a reference in the volume_attachments table. Are you using ceph? Can you share /var/log/ovirt-engine/cinderlib/cinderlib.log
Created attachment 1719602 [details] cinderlib log file Cenderlib log while trying remove disk after made snapshot
Yes - that happened on VM that was previously running, and yes we using CEPH 14.2 I add log file from cinderlib. WBR Mike
(In reply to mandre38 from comment #4) > Created attachment 1719602 [details] > cinderlib log file > > Cenderlib log while trying remove disk after made snapshot The issue seems different, it looks like snapshot failed to be removed. Can you share the full engine and cinderlib logs, I can't see the snapshot removal failure in the attached logs. Additionally, which cinderlib version are you using? Did you install it with the rpm from the a recent openstack repo (train and above) or with pip?
Created attachment 1719652 [details] ovirt engine log Hmmmm...I just test it (make new managed block disk->add it to VM->make snapshot->remove snapshot->remove disk via oVirt Web-interface)once again and removing disk was OK!!! Before open this bug, I tested it on 4 different VMs on 2 different hosts and removing disk was failed with same error.... Including full log anyway... For info: cinderlib was installed via PIP and version is (2.0.0)
(In reply to mandre38 from comment #7) > Created attachment 1719652 [details] > ovirt engine log > > Hmmmm...I just test it (make new managed block disk->add it to VM->make > snapshot->remove snapshot->remove disk via oVirt Web-interface)once again > and removing disk was OK!!! Before open this bug, I tested it on 4 different > VMs on 2 different hosts and removing disk was failed with same error.... > Including full log anyway... > For info: > cinderlib was installed via PIP and version is (2.0.0) Can you start the VM and power it off before removing the snapshot? I believe the issue is related to failure when unmapping the volume from the host Since you are using Ceph, you might be hitting bug 1755801, where multipath prevents the rbd device from being unmapped, it can be worked around by adding rbd to the multipath, like this: # cat /etc/multipath.conf.d/rbd.conf blacklist { devnode "^(rbd)[0-9]*" } But we need to confirm this is indeed the issue
OK, now I actually looked at the logs you attached, and the logs show this is indeed the issue: 2020-09-24 16:45:14,935+02 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.DetachManagedBlockStorageVolumeVDSCommand] (ForkJoinPool-1-worker-19) [7e6b1dc1] Unexpected return value: Status [code=926, message=Managed Volume Helper failed.: ('Error executing helper: Command [\'/usr/bin/python3\', \'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] failed with rc=1 out=b\'\' err=b\'oslo.privsep.daemon: Running privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', \\\'--privsep_context\\\', \\\'os_brick.privileged.default\\\', \\\'--privsep_sock_path\\\', \\\'/tmp/tmpbz8pnmf7/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon starting\\noslo.privsep.daemon: privsep process running with uid/gid: 0/0\\noslo.privsep.daemon: privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: privsep daemon running as pid 9725\\nTraceback (most recent call last):\\n File "/usr/libexec/vdsm/managedvolume-helper", line 154, in <module>\\n sys.exit(main(sys.argv[1:]))\\n File "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n args.command(args)\\n File "/usr/libexec/vdsm/managedvolume-helper", line 149, in detach\\n ignore_errors=False)\\n File "/usr/lib/python3.6/site-packages/vdsm/storage/nos_brick.py", line 121, in disconnect_volume\\n run_as_root=True)\\n File "/usr/local/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute\\n result = self.__execute(*args, **kwargs)\\n File "/usr/local/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute\\n return execute_root(*cmd, **kwargs)\\n File "/usr/local/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap\\n return self.channel.remote_call(name, args, kwargs)\\n File "/usr/local/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call\\n raise exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.\\nCommand: rbd unmap /dev/rbd/rbdpool/volume-c3ccd5c8-ecd0-439b-8071-bb8c3277364a --conf /tmp/brickrbd_ja992li0 --id admin --mon_host 149.148.216.38:6789 --mon_host 149.148.216.39:6789 --mon_host 149.148.216.37:6789\\nExit code: 16\\nStdout: \\\'\\\'\\nStderr: \\\'rbd: sysfs write failed\\\\nrbd: unmap failed: (16) Device or resource busy\\\\n\\\'\\n\'',)] We have the bug mentioned in the previous comment to add the blacklisting as part of vdsm's configuration. I will use this bug to improve the workflow: Add better logs, and add an additional detach attempt if needed when removing to make retries possible.
I made blacklist in /etc/multipath.conf.d/rbd.conf , start the VM next power it off, next remove snapshot, next remove disk without problem: 2020-10-07 11:41:24,281+02 INFO [org.ovirt.engine.core.common.utils.cinderlib.CinderlibExecutor] (default task-69) [dbf360da-109a-41b3-a222-e956f73e42e5] cinderlib output: 2020-10-07 11:41:24,353+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-69) [dbf360da-109a-41b3-a222-e956f73e42e5] EVENT_ID: USER_REMOVE_SNAPSHOT(342), Snapshot 'Snap5' deletion for VM 'TestVM_cinderlib' was initiated by admin@internal-authz. 2020-10-07 11:41:26,665+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-10) [dbf360da-109a-41b3-a222-e956f73e42e5] Command 'RemoveSnapshot' id: 'd7835ce6-e670-4af8-92d0-f7695cc5dd9f' child commands '[6a495488-8093-46ca-82d1-5a428aa516ea]' executions were completed, status 'SUCCEEDED' 2020-10-07 11:41:27,685+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-2) [dbf360da-109a-41b3-a222-e956f73e42e5] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' successfully. 2020-10-07 11:41:27,710+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-2) [dbf360da-109a-41b3-a222-e956f73e42e5] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_SUCCESS(356), Snapshot 'Snap5' deletion for VM 'TestVM_cinderlib' has been completed. 2020-10-07 11:41:52,008+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (default task-69) [8d8c12e3-2cf4-4990-8e60-6516497186ed] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a3d65025-d2ae-4602-83fc-51cf50c2778a=DISK]', sharedLocks='[97f6a75f-7693-440c-b337-b3470a4fc94c=VM]'}' 2020-10-07 11:41:52,558+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engine-Thread-48149) [8d8c12e3-2cf4-4990-8e60-6516497186ed] Running command: RemoveDiskCommand internal: false. Entities affected : ID: a3d65025-d2ae-4602-83fc-51cf50c2778a Type: DiskAction group DELETE_DISK with role type USER 2020-10-07 11:41:52,580+02 INFO [org.ovirt.engine.core.bll.storage.disk.managedblock.RemoveManagedBlockStorageDiskCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [] Running command: RemoveManagedBlockStorageDiskCommand internal: true. 2020-10-07 11:41:57,132+02 INFO [org.ovirt.engine.core.common.utils.cinderlib.CinderlibExecutor] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [] cinderlib output: 2020-10-07 11:41:57,181+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-48149) [8d8c12e3-2cf4-4990-8e60-6516497186ed] EVENT_ID: USER_FINISHED_REMOVE_DISK_ATTACHED_TO_VMS(2,042), Disk TestVM_cinderlib_Disk22gb associated to the VMs TestVM_cinderlib was successfully removed from domain cinderlib1 (User admin@internal-authz). 2020-10-07 11:41:57,202+02 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engine-Thread-48149) [8d8c12e3-2cf4-4990-8e60-6516497186ed] Lock freed to object 'EngineLock:{exclusiveLocks='[a3d65025-d2ae-4602-83fc-51cf50c2778a=DISK]', sharedLocks='[97f6a75f-7693-440c-b337-b3470a4fc94c=VM]'}' 2020-10-07 11:41:58,754+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-44) [8d8c12e3-2cf4-4990-8e60-6516497186ed] Command 'RemoveManagedBlockStorageDisk' id: '7f3829cb-83d1-479a-acad-62af7d6c64b3' child commands '[]' executions were completed, status 'SUCCEEDED' 2020-10-07 11:41:59,770+02 INFO [org.ovirt.engine.core.bll.storage.disk.managedblock.RemoveManagedBlockStorageDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-62) [8d8c12e3-2cf4-4990-8e60-6516497186ed] Ending command 'org.ovirt.engine.core.bll.storage.disk.managedblock.RemoveManagedBlockStorageDiskCommand' successfully.
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.