Bug 1546971 - Cannot modify vNIC profile of running VM
Summary: Cannot modify vNIC profile of running VM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: chhu
URL:
Whiteboard:
Depends On:
Blocks: 1557922
TreeView+ depends on / blocked
 
Reported: 2018-02-20 08:42 UTC by Michael Burman
Modified: 2018-10-30 09:54 UTC (History)
18 users (show)

Fixed In Version: libvirt-4.3.0-1.el7
Doc Type: Bug Fix
Doc Text:
Previously, the check for a non-unique device boot order did not properly handle updates of existing devices when a new device was attached to a guest. Consequently, updating any device with a specified boot order failed. With this update, the duplicity check detects correctly handles updates and ignores the original device, which avoids reporting false conflicts. As a result, updating a device with a boot order succeeds.
Clone Of:
: 1557922 (view as bug list)
Environment:
Last Closed: 2018-10-30 09:52:39 UTC
Target Upstream Version:


Attachments (Terms of Use)
Logs (468.86 KB, application/x-gzip)
2018-02-20 08:42 UTC, Michael Burman
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:3113 None None None 2018-10-30 09:54:42 UTC

Description Michael Burman 2018-02-20 08:42:32 UTC
Created attachment 1398112 [details]
Logs

Description of problem:
Can't set new profile for VM's vNIC - no unplug involved(working properly on 3.6). 

If trying to choose a new profile for the VM's vNIC we failing with:

2018-02-20 10:36:16,922+0200 WARN  (jsonrpc/6) [virt.vm] (vmId='f8abc451-ad7d-4916-ae0b-fca5cfa643bb') Request failed: <?xml version='1.0' encoding='utf-8'?>
<interface type="bridge">
    <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci" />
    <mac address="00:00:00:00:00:50" />
    <model type="virtio" />
    <source bridge="ovirtmgmt" />
    <filterref filter="vdsm-no-mac-spoofing" />
    <link state="up" />
    <boot order="2" />
    <bandwidth />
</interface>
 (vm:3106)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3101, in setLinkAndNetwork
    libvirt.VIR_DOMAIN_AFFECT_LIVE)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2739, in updateDeviceFlags
    if ret == -1: raise libvirtError ('virDomainUpdateDeviceFlags() failed', dom=self)
libvirtError: unsupported configuration: boot order 2 is already used by another device
2018-02-20 10:36:16,923+0200 WARN  (jsonrpc/6) [virt.vm] (vmId='f8abc451-ad7d-4916-ae0b-fca5cfa643bb') Rolling back link and net for: net0 (vm:3115)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3110, in setLinkAndNetwork
    raise SetLinkAndNetworkError(str(e))
SetLinkAndNetworkError: unsupported configuration: boot order 2 is already used by another device
2018-02-20 10:36:16,927+0200 ERROR (jsonrpc/6) [api] FINISH updateDevice error=unsupported configuration: boot order 2 is already used by another device (api:127)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 372, in updateDevice
    return self.vm.updateDevice(params)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3174, in updateDevice
    return self._updateInterfaceDevice(params)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3064, in _updateInterfaceDevice
    custom, specParams):
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3117, in setLinkAndNetwork
    libvirt.VIR_DOMAIN_AFFECT_LIVE)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2739, in updateDeviceFlags
    if ret == -1: raise libvirtError ('virDomainUpdateDeviceFlags() failed', dom=self)
libvirtError: unsupported configuration: boot order 2 is already used by another device

Version-Release number of selected component (if applicable):
vdsm-4.20.18-1.el7ev.x86_64
vdsm-4.19.46-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create network n1 and attach to host
1. Run VM with 1 vNIC(ovirtmgmt) 
2. Edit VM's vNIC and choose n1 profile(don't unplug prior this)

3. Now first unplug and then choose n1 profile and plug back
4. Edit the VM's vNIC again and do the same thing as on step 2, choose ovirtmgmt profile without unplugging first

Actual results:
2. Failed with General Exception and libvirt error on vdsm side
4. Succeeded!
 
Expected results:
2. Should succeed like on step 4, this operation working as expected on 3.6 for example, but not on 4.1/4.2 and it's why this is a regression on vdsm side. 

Additional info:
on 3.6 working fine with the same flow

Comment 1 Dan Kenigsberg 2018-02-21 08:38:45 UTC
please provider libvirt version and try with an older one. It smells like a libvirt bug.

Comment 2 Red Hat Bugzilla Rules Engine 2018-02-21 08:40:19 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Michael Burman 2018-02-21 14:03:04 UTC
(In reply to Dan Kenigsberg from comment #1)
> please provider libvirt version and try with an older one. It smells like a
> libvirt bug.

- libvirt versions in which it is working properly - 

libvirt-client-3.2.0-14.el7_4.9.x86_64 (rhel 7.4)
libvirt-client-2.0.0-10.el7_3.11.x86_64 (rhel 7.3) 

On vdsm 3.6 - vdsm-4.17.44-2.el7ev.noarch (both in engine 3.6 and RHV 4.2!!!)

- libvirt version in which it doesn't works - 

libvirt-client-3.9.0-13.el7.x86_64 (rhel 7.5) 
On vdsm 4.1/4.2 (in engine 4.1/4.2!)

So yes, looks like a regression on libvirt side indeed. 
Dan, can you moce it to the correct component please?

Comment 6 Dan Kenigsberg 2018-02-21 21:36:09 UTC
Michael, can you share your domxml before the attempt to modify the <interface>?

Comment 7 Jiri Denemark 2018-02-22 07:55:17 UTC
Indeed, broken by v3.7.0-14-gc57f3fd2f8:

commit c57f3fd2f8999d17e01272246b26d31179ab7b6e
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Tue Sep 5 16:24:14 2017 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Wed Sep 6 11:19:30 2017 +0200

    conf: Validate device on update-device
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1439991
    
    Whenever a device is being updated via
    virDomainUpdateDeviceFlags() API, we parse the device XML and
    ideally run some generic checks to validate the configuration
    (e.g. if device defines per-device boot order but the domain has
    os/boot element already). Well, that's the theory - due to a
    missing check we've jumped early from that check function.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    Reviewed-by: Erik Skultety <eskultet@redhat.com>

Comment 8 Michael Burman 2018-02-22 12:49:32 UTC
(In reply to Dan Kenigsberg from comment #6)
> Michael, can you share your domxml before the attempt to modify the
> <interface>?

<interface type='bridge'>
      <mac address='00:00:00:00:00:50'/>
      <source bridge='net-2'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <boot order='2'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

Comment 9 Jiri Denemark 2018-02-22 14:22:37 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-February/msg01089.html

Comment 10 Jiri Denemark 2018-02-22 14:26:07 UTC
An easy reproducer:

1. start a domain with an interface device with <boot order='...'/>
2. take the <interface> element from domain XML and change it (e.g., add <link state="up"/> into it)
3. try to update the device using the modified interface XML from step 3

Comment 11 Jiri Denemark 2018-02-22 14:28:09 UTC
BTW, don't forget to verify bug 1439991 is still fixed after applying the patches for this bug.

Comment 12 Jiri Denemark 2018-02-23 11:05:14 UTC
This is now fixed upstream by:

commit e3497f3fc895c427ce35188f3ddb2bb47a853be9
Refs: v4.0.0-318-ge3497f3fc8
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Thu Feb 22 13:22:39 2018 +0100
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Fri Feb 23 11:52:44 2018 +0100

    lxc: Drop useless check in live device update

    Checking the new device definition makes little sense when lxc driver
    does not support live device update at all.

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>

commit b6a264e8550d4add3946ec2fd9ae31a76fbf16fe
Refs: v4.0.0-319-gb6a264e855
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Thu Feb 22 13:30:27 2018 +0100
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Fri Feb 23 11:52:44 2018 +0100

    Pass oldDev to virDomainDefCompatibleDevice on device update

    When calling virDomainDefCompatibleDevice to check a new device during
    device update, we need to pass the original device which is going to be
    updated in addition to the new device. Otherwise, the function can
    report false conflicts.

    The new argument is currently ignored by virDomainDefCompatibleDevice,
    but this will change in the following patch.

    https://bugzilla.redhat.com/show_bug.cgi?id=1546971

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>

commit edae027cfe02be4863dcef1e7f0ea0564766e312
Refs: v4.0.0-320-gedae027cfe
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Thu Feb 22 13:51:36 2018 +0100
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Fri Feb 23 11:52:44 2018 +0100

    qemu: Fix updating device with boot order

    Commit v3.7.0-14-gc57f3fd2f8 prevented adding a <boot order='x'/>
    element to an inactive domain with global <boot dev='...'/> element.
    However, as a result of that change updating any device with boot order
    would fail with 'boot order X is already used by another device', where
    "another device" is in fact the device which is being updated.

    To fix this we have to ignore the device which we're about to update
    when checking for boot order conflicts.

    https://bugzilla.redhat.com/show_bug.cgi?id=1546971

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>

Comment 15 yalzhang@redhat.com 2018-02-28 02:50:23 UTC
Test on upstream libvirt, hotplug interface will cause libvirtd crash, please help to check, Thank you!

Reproduce this bug on libvirt-3.9.0-13.el7.x86_64 as comment 10:
1. start a domain with an interface device with <boot order='2'/>

2. take the <interface> element from domain XML and change it (e.g., add <link state="up"/> into it)

3. try to update the device using the modified interface XML from step 3
# virsh update-device rhel interface.xml
error: Failed to update device from interface.xml
error: unsupported configuration: boot order 2 is already used by another device

Test on upstream libvirt-4.1.0-1.el7.x86_64 as comment 10, update succeed in step 3:

# virsh update-device rhel interface.xml
Device updated successfully

# virsh dumpxml rhel | grep /interface -B9
    <interface type='network'>
      <mac address='52:54:00:1b:a5:b9'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <link state='up'/>
      <boot order='2'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

But with libvirt-4.1.0-1.el7.x86_64, hotplug interface/device will cause libvirtd crash.
# virsh attach-interface rhel network default 
error: Disconnected from qemu:///system due to end of file
error: Failed to attach interface
error: End of file while reading data: Input/output error

Comment 16 Luyao Huang 2018-02-28 03:06:00 UTC
Backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe6b5031700 (LWP 4892)]
virDomainDeviceGetInfo (device=device@entry=0x0) at conf/domain_conf.c:3565
3565	    switch ((virDomainDeviceType) device->type) {
(gdb) bt
#0  virDomainDeviceGetInfo (device=device@entry=0x0) at conf/domain_conf.c:3565
#1  0x00007fe6c63691e6 in virDomainDefCompatibleDevice (def=0x7fe6a017d320, dev=dev@entry=0x7fe68c002890, oldDev=oldDev@entry=0x0) at conf/domain_conf.c:27420
#2  0x00007fe6a831b920 in qemuDomainAttachDeviceLiveAndConfig (flags=<optimized out>, xml=<optimized out>, driver=0x7fe6a0100f60, vm=0x7fe6a017bbd0) at qemu/qemu_driver.c:8499
#3  qemuDomainAttachDeviceFlags (dom=<optimized out>, xml=<optimized out>, flags=1) at qemu/qemu_driver.c:8558
...

And this issue is introduced by the patch which mentioned in comment 12. I think libvirt need check if &oldDev is NULL when call virDomainDeviceGetInfo() in function virDomainDefCompatibleDevice().

Comment 17 John Ferlan 2018-03-01 16:26:32 UTC
I tripped across the issues from Comment 15 & 16 doing something similar... 

See: https://www.redhat.com/archives/libvir-list/2018-March/msg00002.html

This was just pushed upstream as:

$ git describe 5535856f0e31aa6abf4cda11b5c53c0f164680f0
v4.1.0-rc2-8-g5535856f0
$ git show 5535856f0e31aa6abf4cda11b5c53c0f164680f0
Author: John Ferlan <jferlan@redhat.com>
Date:   Thu Mar 1 07:08:32 2018 -0500

    conf: Fix crash in virDomainDefCompatibleDevice
    
    Commit id 'edae027c' blindly assumed that the passed @oldDev
    parameter would not be NULL when calling virDomainDeviceGetInfo;
    however, commit id 'b6a264e8' passed NULL for AttachDevice
    callers under the premise that there wouldn't be a device
    to check/update against.
    
    Signed-off-by: John Ferlan <jferlan@redhat.com>

Comment 21 chhu 2018-07-11 09:08:05 UTC
Reproduced on packages:
libvirt-3.9.0-14.el7.x86_64
qemu-kvm-rhev-2.10.0-21.el7.x86_64

Verified on packages:
libvirt-4.5.0-1.el7.x86_64
qemu-kvm-rhev-2.12.0-7.el7.x86_64

Test steps:
1. Update "link state" of interface - PASS(can update)
2. Update "boot order" of interface when vm is active - PASS(not support)
3. update "boot order" of interface when vm is inactive - PASS(can update)
4. Hotplug with "boot order" with os/boot elements(negative) - PASS(unsupported)
5. Hotplug with duplicate "boot order"(negative) - PASS(report error)
6. Hotplug with "boot order" - PASS
7. Hotplug without "boot order" - PASS
8. Coldplug for 4,5,6,7 - PASS
9. Update elements or attributes other than "boot order" when vm is inactive - PASS(can update)
10. Invalid "boot order"(negative) when vm is active or inactive - PASS(report error)
11. Test cover bug1439991 - PASS

Comment 31 errata-xmlrpc 2018-10-30 09:52:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113


Note You need to log in before you can comment on or make changes to this bug.