Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1534950

Summary: [Backup restore API] Start VM, with attached snapshot disk on block based storage, fails on libvirtError
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: CoreAssignee: Eyal Shenitzky <eshenitz>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.20.13CC: amureini, bugs, ebenahar, nsoffer, tnisan
Target Milestone: ovirt-4.2.1Keywords: Automation, Regression
Target Release: ---Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.20.16 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-12 11:46:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1526815    
Attachments:
Description Flags
logs
none
logs-RHEL7.4
none
logs-with fix none

Description Elad 2018-01-16 10:47:47 UTC
Created attachment 1381944 [details]
logs

Description of problem:
Basic backup restore API scenario of VM start while it has a snapshot disk attached fails with an unsupported configuration libvirt error.

Version-Release number of selected component (if applicable):
vdsm-4.20.13-1.el7ev.x86_64
libvirt-3.9.0-7.el7.x86_64
qemu-kvm-rhev-2.10.0-16.el7.x86_64
sanlock-3.6.0-1.el7.x86_64
selinux-policy-3.13.1-183.el7.noarch
kernel - 3.10.0-823.el7.x86_64 #1 SMP Wed Dec 13 21:17:45 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
RHEL 7.5


How reproducible:
Always

Steps to Reproduce:
1. Create VM (source VM) with disk attached (storage type doesn't matter)
2. Create a snapshot for the source VM
3. Create a second VM (backup VM)
4. Attach the backup disk snapshot of source VM to the backup VM (via REST)
5. Start backup VM

Actual results:

Start backup VM fails:


2018-01-16 11:16:32,161+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1768) [] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM backup_v
m_TestCase6178_1611152802 (User: admin@internal-authz).



2018-01-16 11:16:28,156+0200 ERROR (vm/00eab462) [virt.vm] (vmId='00eab462-c7a1-442d-911e-322872b533d2') The vm start process failed (vm:917)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 846, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2746, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads
2018-01-16 11:16:28,157+0200 INFO  (vm/00eab462) [virt.vm] (vmId='00eab462-c7a1-442d-911e-322872b533d2') Changed state to Down: unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads (code=1) (vm:1636)



libvirtd.log:


2018-01-16 09:16:28.125+0000: 18311: error : qemuBuildDriveStrValidate:1616 : unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads



Expected results:
VM start should succeed

Additional info:
logs

Comment 1 Tal Nisan 2018-01-16 11:00:14 UTC
Elad, can you please test if this reproduces also on RHEL 7.4?

Comment 2 Elad 2018-01-16 11:01:39 UTC
Occurs only with block storage (passed with NFS)

I'll test also with RHEL7.4

Comment 3 Elad 2018-01-17 13:32:17 UTC
Created attachment 1382412 [details]
logs-RHEL7.4

Reproduced on RHEL7.4:

2018-01-17 15:12:04,465+0200 ERROR (vm/a73fcf41) [virt.vm] (vmId='a73fcf41-3024-4f54-b783-b47f57d1406d') The vm start process failed (vm:917)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 846, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2746, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1069, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads
2018-01-17 15:12:04,515+0200 INFO  (vm/a73fcf41) [virt.vm] (vmId='a73fcf41-3024-4f54-b783-b47f57d1406d') Changed state to Down: unsupported configuration: native I/O needs either no disk cache or directsync cach
e mode, QEMU will fallback to aio=threads (code=1) (vm:1636)




rhvm-4.2.1.1-0.1.el7.noarch
vdsm-4.20.13-1.el7ev.x86_64
libvirt-3.2.0-14.el7_4.7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64

Comment 4 Allon Mureinik 2018-01-21 09:54:40 UTC
In org.ovirt.engine.core.vdsbroker.builder.vminfo.LibvirtVmXmlBuilder#writeDiskDriver, the "io" attribute is calculated according to the disk's storage type (file vs block vs network). In the case of a transient snapshot, we create a file layer on top of whatever the original type was, and the IO type may no longer be applicable.

Comment 5 Allon Mureinik 2018-01-23 11:09:04 UTC
Elad - can you weigh in on the "Regression" flag here, please?
When was the last time we saw this working properly? If the attached patch is correct, it looks like this hasn't been working for a while, at the very least since 4.1

Comment 6 Eyal Shenitzky 2018-01-23 11:12:35 UTC
This bug reproduces in 4.1

Comment 7 Allon Mureinik 2018-01-23 11:18:22 UTC
(In reply to Eyal Shenitzky from comment #6)
> This bug reproduces in 4.1

Removing the blocker flag based on this assessment and marking as exception instead.
It's a nasty bug that we need to fix, but if it went unnoticed for the entire 4.1 version (at least!), we shouldn't block on it.
I also removed the Regression keyword to prevent the bot from returning the blocker flag.

Comment 8 Elad 2018-01-23 11:27:51 UTC
As Eyal mentioned, the bug reproduces in 4.1 but I'm almost 100% sure that in older versions it doesn't as it breaks the basic functionality of backup restore API.

Comment 9 Nir Soffer 2018-01-23 22:31:14 UTC
Elad, did you test this with vdsm from ovirt-4.1? Eyal tested with cluster version
4.1, which is not the same.

I think this is a regression caused by [1].

Before this patch, we checked the disk type late, after the transient disk was
created, and we should detect the correct disk type (file). After this patch,
we get the disk type when preparing the original disk. Since the code creating
the transient disk on top of the original disk did not change the disk type, 
we use the disk type of the original disk.

You can verify that this is a regression by running automation with the patch
before [1].

Running with the patch[2] fixing this issue would also be nice, since we don't have
any test for this code in vdsm.

[1] https://gerrit.ovirt.org/#/c/85307/
[2] https://gerrit.ovirt.org/#/c/86687/

Comment 10 Elad 2018-01-24 10:01:57 UTC
Right, on 4.1 setup, the bug doesn't occur. 
Marking as a regression

Comment 11 Red Hat Bugzilla Rules Engine 2018-01-24 10:02:04 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 12 Nir Soffer 2018-01-24 11:38:14 UTC
Elad is testing the fix, we will merge once we get a confirmation that automation
pass with the fix.

Comment 13 Elad 2018-01-24 12:00:25 UTC
Created attachment 1385538 [details]
logs-with fix

Using [1], backup VM start, with snapshot disk attached, works properly.

2018-01-24 13:50:59,059+0200 INFO  (jsonrpc/0) [api.virt] START create(vmParams={'xml': '<?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="h
ttp://ovirt.org/vm/1.0"><name>backup_vm_TestCase6178_2413503280</name><uuid>85cf58ed-4b21-4026-9d7c-410aa28dad1a</uuid><memory>1048576</memory><currentMemory>1048576</currentMemory><maxMemory slots="16">4194304<
/maxMemory><vcpu current="1">16</vcpu><sysinfo type="smbios"><system><entry name="manufacturer">oVirt</entry><entry name="product">


[1]
https://gerrit.ovirt.org/#/c/86687/

vdsm-4.20.14-1.el7ev.x86_64
libvirt-3.9.0-7.el7.x86_64

Comment 14 Elad 2018-01-25 16:26:33 UTC
Start of a VM, with attached snapshot disk on block storage (tested with iSCSI) succeeded.

Used:
vdsm-4.20.17-1.el7ev.x86_64
libvirt-3.9.0-7.el7.x86_64
qemu-kvm-rhev-2.10.0-18.el7.x86_64
rhvm-4.2.1.3-0.1.el7.noarch

Comment 15 Sandro Bonazzola 2018-02-12 11:46:36 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.