Created attachment 916296 [details] vdsm+engine logs Description of problem: The operation of deactivating and activating an LV on a running guest can lead to several failures that originate so it seems from a race condition. when having a running guest with deactivated Block disks we expect that the Lv's will be deactivated as well,but when this bug appears the host reports the Lv's as activated. [root@camel-vdsc ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert 21a9407a-089a-4321-a86c-04eb41b58866 70dfdfa3-653c-4656-a831-d73a1681b068 -wi-a----- 1.00g 5c7a16ca-6c01-433f-a402-a176a36e466c 70dfdfa3-653c-4656-a831-d73a1681b068 -wi-a----- 2.00g 5e737af0-aa6f-419d-b48e-156d42c83bdf 70dfdfa3-653c-4656-a831-d73a1681b068 -wi-a----- 2.00g lv_root vg0 -wi-ao---- 224.88g lv_swap vg0 -wi-ao---- 7.81g when trying to remove those disks,their state becomes illegal from oVirt's log: 2014-07-08 10:03:53,098 ERROR [org.ovirt.engine.core.bll.RemoveImageCommand] (org.ovirt.thread.pool-8-thread-33) [15e14103] Command org.ovirt.engine.core.bll.RemoveImageCommand throw Vdc Bll exception Repeating the removal operation causes data corruption to psql tables. engine=# SELECT volume_format,image_group_id,creation_date,_update_date,active,it_guid FROM images; volume_format | image_group_id | creation_date | _update_date | active | it_guid ---------------+----------------+------------------------+--------------+--------+-------------------------------------- 4 | | 2008-04-01 00:00:00+03 | | t | 00000000-0000-0000-0000-000000000000 (1 row) images table is empty. lvs command shows activated LV's and when executing fuser: [root@camel-vdsc ~]# fuser -kuc /dev/70dfdfa3-653c-4656-a831-d73a1681b068/21a9407a-089a-4321-a86c-04eb41b58866 /dev/70dfdfa3-653c-4656-a831-d73a1681b068/21a9407a-089a-4321-a86c-04eb41b58866: 11601(qemu) we see that qemu process still uses/locks the LV [root@camel-vdsc ~]# lvs -o lv_name,lv_tags LV LV Tags 21a9407a-089a-4321-a86c-04eb41b58866 PU_00000000-0000-0000-0000-000000000000,MD_5,IU__remove_me_d17878dc-55c5-4297-9934-9d2f19adb996 5c7a16ca-6c01-433f-a402-a176a36e466c MD_6,PU_00000000-0000-0000-0000-000000000000,IU__remove_me_029fd296-9588-46ce-95a6-cd365c9df48c 5e737af0-aa6f-419d-b48e-156d42c83bdf MD_4,PU_00000000-0000-0000-0000-000000000000,IU__remove_me_2a382a92-0e4c-473f-b5c8-0540386188e9 adding the lv_tags flag shows that a string "remove_me" is added to image id. asinc_task table is not cleared as well engine=# SELECT task_type,task_id,command_id,action_type,started_at FROM async_tasks; task_type | task_id | command_id | action_type | started_at -----------+--------------------------------------+--------------------------------------+-------------+---------------------------- 5 | b5627649-074f-4afc-b3bf-a995fcd7bab3 | 6a32e70d-c5d2-45bf-ab5c-1fe7166efa94 | 230 | 2014-07-08 10:04:00.236+03 5 | 803f412d-4de8-4319-ba3e-51463e8f74f5 | 41e9c6bf-f237-4907-a4dd-4d825bfd1c50 | 230 | 2014-07-08 10:04:12.018+03 5 | 3946f603-68bf-4d2a-a326-f86f94a601ce | 033357a8-8f55-4a31-92c9-9b5f35e09a1b | 230 | 2014-07-08 10:04:23.273+03 (3 rows) Then if we try to remove the Block domain, operation fail with bll and irs exceptions from oVirt-engine logs: 2014-07-08 10:07:18,797 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp--127.0.0.1-8702-7) [42184087] Failed in FormatStorageDomainVDS method 2014-07-08 10:07:18,798 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp--127.0.0.1-8702-7) [42184087] Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=508, mMessage=Volume Group remove error: ('VG 70dfdfa3-653c-4656-a831-d73a1681b068 remove failed.',)]] 2014-07-08 10:07:18,805 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp--127.0.0.1-8702-7) [42184087] HostName = vdsc 2014-07-08 10:07:18,809 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp--127.0.0.1-8702-7) [42184087] Command FormatStorageDomainVDSCommand(HostName = vdsc, HostId = e61ee2aa-fa3c-49fc-a803-533663f6b9c1, storageDomainId=70dfdfa3-653c-4656-a831-d73a1681b068) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Volume Group remove error: ('VG 70dfdfa3-653c-4656-a831-d73a1681b068 remove failed.',), code = 508 2014-07-08 10:07:18,822 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp--127.0.0.1-8702-7) [42184087] FINISH, FormatStorageDomainVDSCommand, log id: 19707ea1 2014-07-08 10:07:18,829 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp--127.0.0.1-8702-7) [42184087] Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Volume Group remove error: ('VG 70dfdfa3-653c-4656-a831-d73a1681b068 remove failed.',), code = 508 (Failed with error VolumeGroupRemoveError and code 508) 2014-07-08 10:07:18,850 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-7) [42184087] Correlation ID: 575f094, Job ID: 89047a49-fbf6-49a5-b740-faf56f3f02c6, Call Stack: null, Custom Event ID: -1, Message: Failed to remove Storage Domain ISCSI. (User: admin) and from vdsm's log: Thread-19::ERROR::2014-07-08 10:07:21,304::task::866::Storage.TaskManager.Task::(_setError) Task=`0b751b9e-e9b1-4e1a-8110-7d757c3238e0`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2760, in formatStorageDomain self._recycle(sd) File "/usr/share/vdsm/storage/hsm.py", line 2706, in _recycle dom.format(dom.sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 900, in format lvm.removeVG(sdUUID) File "/usr/share/vdsm/storage/lvm.py", line 940, in removeVG raise se.VolumeGroupRemoveError("VG %s remove failed." % vgName) Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: Setup:have a dc with nfs as master,and an iscsi domain 1.create vm + 4 disks on iscsi 2.run vm 3.diactivate all the disks quickly 4.after all disks are diactivated try to activate them 5.wait for a ui error box 6.remove the disks,(they all become illegal) remove again 7.maintain the iscsi domain and remove it (fails) Actual results: multiple failures,exceptions which leads to data lose Expected results: removing an image or a block domain should be successful according to Ovirt's docs Additional info: important note another ERROR also appears on vdsm's logs every several seconds please read BZ #1116826 first.
*** note *** Happens with virtio block disks only
What do you mean in steps 3 and 4? Hot unplug and plug via the UI?
(In reply to Nir Soffer from comment #2) > What do you mean in steps 3 and 4? > Hot unplug and plug via the UI? Yes,actually time is not a factor here, the lv's are not activated due to virtio disk qualities (cannot be hotplugged unless an OS is installed on guest) Updaing Steps to Reproduce: Setup:have a dc with nfs as master,and an iscsi domain 1.create vm + 4 virtio disks on iscsi 2.run vm 3. deactivate all the disks one by one from the UI 4.after all disks are deactivated try to activate them (again from the UI) 5.wait for a ui error box 6.remove the disks,(they all become illegal) remove again 7.maintain the iscsi domain and remove it (fails)
Expected results: Hotpluging virtio disks should be blocked with CNA in a case of absence of OS on guest
The engine does not have any insight as to whether the Guest OS exists, or if it supports hotpluging or not. Additionally, there's no business usecase for running (and hot [un]plugging) VMs with no guest OS.