Bug 1660265
| Summary: | [BLOCKED]LSM's diskReplicateFinish failed on libvirtError: internal error: unable to execute QEMU command 'block-job-complete': Block job 'drive-ua....' not found | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Bimal Chollera <bcholler> | ||||
| Component: | vdsm | Assignee: | Benny Zlotnik <bzlotnik> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Avihai <aefrat> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.1.11 | CC: | aefrat, aperotti, aromito, bcholler, dfediuck, ebenahar, fkust, frolland, hhan, jcoscia, lleistne, lsurette, mjankula, mkalinin, mprivozn, pkrempa, srevivo, tnisan, ycui | ||||
| Target Milestone: | ovirt-4.3.8 | Keywords: | Automation, Regression | ||||
| Target Release: | 4.3.1 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1597019 | Environment: | |||||
| Last Closed: | 2019-10-29 16:43:09 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1597019, 1643707 | ||||||
| Bug Blocks: | 1417161 | ||||||
| Attachments: |
|
||||||
|
Description
Bimal Chollera
2018-12-18 01:25:24 UTC
Seems a libvirt issue. Moving to 4.2.9. Hi Bimal, I am a libvirt QE. Recently I am looking into the LSM problem BZ1643707. Could you please summarize these info to help me reproduce it: 1. What is the lastest ovirt/libvirt/vdsm/qemu-kvm version that the customers encounted the bug at? 2. What is the LSM source and destination SD type in the customers' ticket? Any special settings in SD? 3. For the reproducing steps in comment0, are step2 and step4 necessary? I find the LSM is on vm2. I think the steps related to vm1 can be removed. Hi Han, The following are the versions and SD information. ~~~ ovirt-engine-4.2.7.4-0.1.el7ev.noarch libvirt-client-3.2.0-14.el7_4.3.x86_64 vdsm-4.19.31-1.el7ev.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64 Both Source and Destination source SD type are FC ~~~ ~~~ ovirt-engine-4.2.7.4-0.1.el7ev.noarch libvirt-4.5.0-10.el7_6.3.x86_64 vdsm-4.20.43-1.el7ev.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.1.x86_64 Both Source and Destination source SD type are ISCSI ~~~ ~~~ ovirt-engine-4.1.11.2-0.1.el7.noarch libvirt-4.5.0-10.el7_6.2.x86_64 vdsm-4.20.43-1.el7ev.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.1.x86_64 Both Source and Destination source SD type are FC ~~~ This was seen on couple different end user as well. Yes I think we can eliminated the steps vm1. I found those steps in the upstream BZ. I have tried several times to reproduce this but have not been successful. On thing I noticed was if selinux (disabled selinux with "setenforce 0) was disabled the migration worked. I am spectulating here, but I feel something on with selinux was disabled when the disks were created, then later enabled hence causes the LSM to fail. Also for we noticed extra lv sym links on both SD's. See comment 3 https://bugzilla.redhat.com/show_bug.cgi?id=1660265#c3 in the BZ. Thanks Bimal. I tried to reproduce it on:
ovirt-4.2.8.1-0.1.el7ev
vdsm-4.20.46-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.4.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
Steps:
1. Create DC. Add cluster. Add a host. Add two iscsi SD. Create a vm on iscsi domain
2. LSM from one iscsi SD to another
# ./ovirt-move.py -g ff -S rhv2 -D rhv1
LSM failed due to another issue:
2019-02-12 20:29:03,828-0500 ERROR (tasks/7) [storage.Volume] Failed to create volume /rhev/data-center/mnt/blockSD/90ab8104-4656-4f73-a81f-1c30f27d6a0a/images/ecac83c8-17f6-4802-8157-3b9a0b09964e/994197b3-e12d-4f14-8237-d84742f1a95e: Cannot create Logical Volume: 'vgname=90ab8104-4656-4f73-a81f-1c30f27d6a0a lvname=994197b3-e12d-4f14-8237-d84742f1a95e err=[\' Logical Volume "994197b3-e12d-4f14-8237-d84742f1a95e" already exists in volume group "90ab8104-4656-4f73-a81f-1c30f27d6a0a"\']' (volume:1209)
2019-02-12 20:29:03,828-0500 ERROR (tasks/7) [storage.Volume] Unexpected error (volume:1246)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1206, in create
initialSize=initialSize)
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 505, in _create
initialTags=(sc.TAG_VOL_UNINIT,))
File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1161, in createLV
raise se.CannotCreateLogicalVolume(vgName, lvName, err)
CannotCreateLogicalVolume: Cannot create Logical Volume: 'vgname=90ab8104-4656-4f73-a81f-1c30f27d6a0a lvname=994197b3-e12d-4f14-8237-d84742f1a95e err=[\' Logical Volume "994197b3-e12d-4f14-8237-d84742f1a95e" already exists in volume group "90ab8104-4656-4f73-a81f-1c30f27d6a0a"\']'
2019-02-12 20:29:03,828-0500 ERROR (tasks/7) [storage.Image] Unexpected error (image:420)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 403, in _createTargetImage
srcVolUUID=volParams['parent'])
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 820, in createVolume
initialSize=initialSize)
File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1206, in create
initialSize=initialSize)
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 505, in _create
initialTags=(sc.TAG_VOL_UNINIT,))
File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1161, in createLV
raise se.CannotCreateLogicalVolume(vgName, lvName, err)
CannotCreateLogicalVolume: Cannot create Logical Volume: 'vgname=90ab8104-4656-4f73-a81f-1c30f27d6a0a lvname=994197b3-e12d-4f14-8237-d84742f1a95e err=[\' Logical Volume "994197b3-e12d-4f14-8237-d84742f1a95e" already exists in volume group "90ab8104-4656-4f73-a81f-1c30f27d6a0a"\']'
2019-02-12 20:29:03,830-0500 ERROR (tasks/7) [storage.TaskManager.Task] (Task='e2d24bc8-9241-448f-b5f0-8a8ac678086a') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
return fn(*args, **kargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1733, in cloneImageStructure
img.cloneStructure(sdUUID, imgUUID, dstSdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 637, in cloneStructure
self._createTargetImage(sdCache.produce(dstSdUUID), sdUUID, imgUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 403, in _createTargetImage
srcVolUUID=volParams['parent'])
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 820, in createVolume
Elad, have you met the problem like this before? Any workaround for it?
Created attachment 1534286 [details]
script and vdsm/ovirt log
Seems like bug 1650985 Michel, it looks like a libvirt bug. Do you have any idea to reproduce it? (In reply to Han Han from comment #26) > Michel, it looks like a libvirt bug. Do you have any idea to reproduce it? None, sorry. But I feel like there are two or even more problems discussed here (guessing from two different error messages reported here): libvirtError: internal error: unable to execute QEMU command 'block-job-complete': Block job 'drive-ua-fee6faec-ebfc-440a-8e88-84fe4ecf68d1' not found libvirtError: internal error: unable to execute QEMU command 'block-job-complete': The active block job 'drive-ua-288a92ab-90f5-4d31-be8b-ea7f2c15c092' cannot be completed libvirtError: internal error: unable to execute QEMU command 'block-job-complete': Block job 'drive-virtio-disk0' not found However, Peter might have more insight here. From the described messages and cross-linked customer cases it looks like https://bugzilla.redhat.com/show_bug.cgi?id=1643707 . Given that I was not able to find any useful libvirtd debug logs I can't really tell what's happening. It is fishy though that 'block-job-complete' was called in the situations where the job is missing as libvirt makes sure that the job is in ready state before sending the command or that it can't be completed. Both should not have happened. Unfortunately without debug logs or an reproducer it's very hard to figure out what's happening. Avihai, can you guys please help with logs according to comment #28? (In reply to Tal Nisan from comment #29) > Avihai, can you guys please help with logs according to comment #28? Hi Tal , I did not encounter this issue in later runs (I checked with Bug 1597019 automation TC's) and this did not occur. @Elad, correct me if I'm wrong . See also Elad's comment #11 from Bug 1597019 that this issue was not seen for a long time now. What about who reported this issue, is this issue still seen there? I see the customer tickets attached here were updated today so I guess there are customers who face this. Sync for product update in Jira. I see. So we are in the same boat of no libvirt logs. See our request here to get those logs back: bz#1749634. For now, Antonio, sounds like the best way is to enable libvirt logging via virt-admin and get the missing libvirt logs. Closing as insufficient data due to the missing Libvirt logs requested in bug 1643707, please reopen once the issue is reproduced and logs can be provided |