Bug 1534950
| Summary: | [Backup restore API] Start VM, with attached snapshot disk on block based storage, fails on libvirtError | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Elad <ebenahar> | ||||||||
| Component: | Core | Assignee: | Eyal Shenitzky <eshenitz> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 4.20.13 | CC: | amureini, bugs, ebenahar, nsoffer, tnisan | ||||||||
| Target Milestone: | ovirt-4.2.1 | Keywords: | Automation, Regression | ||||||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.2+
rule-engine: blocker+ |
||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | v4.20.16 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-02-12 11:46:36 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1526815 | ||||||||||
| Attachments: |
|
||||||||||
Elad, can you please test if this reproduces also on RHEL 7.4? Occurs only with block storage (passed with NFS) I'll test also with RHEL7.4 Created attachment 1382412 [details]
logs-RHEL7.4
Reproduced on RHEL7.4:
2018-01-17 15:12:04,465+0200 ERROR (vm/a73fcf41) [virt.vm] (vmId='a73fcf41-3024-4f54-b783-b47f57d1406d') The vm start process failed (vm:917)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 846, in _startUnderlyingVm
self._run()
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2746, in _run
dom.createWithFlags(flags)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1069, in createWithFlags
if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads
2018-01-17 15:12:04,515+0200 INFO (vm/a73fcf41) [virt.vm] (vmId='a73fcf41-3024-4f54-b783-b47f57d1406d') Changed state to Down: unsupported configuration: native I/O needs either no disk cache or directsync cach
e mode, QEMU will fallback to aio=threads (code=1) (vm:1636)
rhvm-4.2.1.1-0.1.el7.noarch
vdsm-4.20.13-1.el7ev.x86_64
libvirt-3.2.0-14.el7_4.7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
In org.ovirt.engine.core.vdsbroker.builder.vminfo.LibvirtVmXmlBuilder#writeDiskDriver, the "io" attribute is calculated according to the disk's storage type (file vs block vs network). In the case of a transient snapshot, we create a file layer on top of whatever the original type was, and the IO type may no longer be applicable. Elad - can you weigh in on the "Regression" flag here, please? When was the last time we saw this working properly? If the attached patch is correct, it looks like this hasn't been working for a while, at the very least since 4.1 This bug reproduces in 4.1 (In reply to Eyal Shenitzky from comment #6) > This bug reproduces in 4.1 Removing the blocker flag based on this assessment and marking as exception instead. It's a nasty bug that we need to fix, but if it went unnoticed for the entire 4.1 version (at least!), we shouldn't block on it. I also removed the Regression keyword to prevent the bot from returning the blocker flag. As Eyal mentioned, the bug reproduces in 4.1 but I'm almost 100% sure that in older versions it doesn't as it breaks the basic functionality of backup restore API. Elad, did you test this with vdsm from ovirt-4.1? Eyal tested with cluster version 4.1, which is not the same. I think this is a regression caused by [1]. Before this patch, we checked the disk type late, after the transient disk was created, and we should detect the correct disk type (file). After this patch, we get the disk type when preparing the original disk. Since the code creating the transient disk on top of the original disk did not change the disk type, we use the disk type of the original disk. You can verify that this is a regression by running automation with the patch before [1]. Running with the patch[2] fixing this issue would also be nice, since we don't have any test for this code in vdsm. [1] https://gerrit.ovirt.org/#/c/85307/ [2] https://gerrit.ovirt.org/#/c/86687/ Right, on 4.1 setup, the bug doesn't occur. Marking as a regression This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. Elad is testing the fix, we will merge once we get a confirmation that automation pass with the fix. Created attachment 1385538 [details] logs-with fix Using [1], backup VM start, with snapshot disk attached, works properly. 2018-01-24 13:50:59,059+0200 INFO (jsonrpc/0) [api.virt] START create(vmParams={'xml': '<?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="h ttp://ovirt.org/vm/1.0"><name>backup_vm_TestCase6178_2413503280</name><uuid>85cf58ed-4b21-4026-9d7c-410aa28dad1a</uuid><memory>1048576</memory><currentMemory>1048576</currentMemory><maxMemory slots="16">4194304< /maxMemory><vcpu current="1">16</vcpu><sysinfo type="smbios"><system><entry name="manufacturer">oVirt</entry><entry name="product"> [1] https://gerrit.ovirt.org/#/c/86687/ vdsm-4.20.14-1.el7ev.x86_64 libvirt-3.9.0-7.el7.x86_64 Start of a VM, with attached snapshot disk on block storage (tested with iSCSI) succeeded. Used: vdsm-4.20.17-1.el7ev.x86_64 libvirt-3.9.0-7.el7.x86_64 qemu-kvm-rhev-2.10.0-18.el7.x86_64 rhvm-4.2.1.3-0.1.el7.noarch This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Created attachment 1381944 [details] logs Description of problem: Basic backup restore API scenario of VM start while it has a snapshot disk attached fails with an unsupported configuration libvirt error. Version-Release number of selected component (if applicable): vdsm-4.20.13-1.el7ev.x86_64 libvirt-3.9.0-7.el7.x86_64 qemu-kvm-rhev-2.10.0-16.el7.x86_64 sanlock-3.6.0-1.el7.x86_64 selinux-policy-3.13.1-183.el7.noarch kernel - 3.10.0-823.el7.x86_64 #1 SMP Wed Dec 13 21:17:45 EST 2017 x86_64 x86_64 x86_64 GNU/Linux RHEL 7.5 How reproducible: Always Steps to Reproduce: 1. Create VM (source VM) with disk attached (storage type doesn't matter) 2. Create a snapshot for the source VM 3. Create a second VM (backup VM) 4. Attach the backup disk snapshot of source VM to the backup VM (via REST) 5. Start backup VM Actual results: Start backup VM fails: 2018-01-16 11:16:32,161+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1768) [] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM backup_v m_TestCase6178_1611152802 (User: admin@internal-authz). 2018-01-16 11:16:28,156+0200 ERROR (vm/00eab462) [virt.vm] (vmId='00eab462-c7a1-442d-911e-322872b533d2') The vm start process failed (vm:917) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 846, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2746, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads 2018-01-16 11:16:28,157+0200 INFO (vm/00eab462) [virt.vm] (vmId='00eab462-c7a1-442d-911e-322872b533d2') Changed state to Down: unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads (code=1) (vm:1636) libvirtd.log: 2018-01-16 09:16:28.125+0000: 18311: error : qemuBuildDriveStrValidate:1616 : unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads Expected results: VM start should succeed Additional info: logs