Bug 1551971

Summary: [OVN] cannot start VM with ovn network
Product: [oVirt] vdsm Reporter: Michael Burman <mburman>
Component: GeneralAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.20.19CC: ahadas, amusil, bugs, danken, fromani, lveyde, mburman, michal.skrivanek, mkalfon
Target Milestone: ovirt-4.2.2Keywords: Regression
Target Release: ---Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm v4.20.22 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-29 11:14:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1535006    
Attachments:
Description Flags
Logs
none
sr-iov vm failed to run none

Description Michael Burman 2018-03-06 09:37:23 UTC
Created attachment 1404699 [details]
Logs

Description of problem:
[OVN] - ovn is broken on latest d/s build - can't start VM with ovn network.

Trying to start VM with ovn network on 4.2.2.2-0.1.el7 and failing with the generic error - 

2018-03-06 11:32:54,015+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-4) [] EVENT_ID: VM_DOWN_ERROR(119), VM V3 is down with error. Exit message: Cannot get interface MTU on 'ovn_test_custom': No such device.

2018-03-06 11:32:52,011+0200 ERROR (vm/5cdbe981) [virt.vm] (vmId='5cdbe981-039c-44cf-95cd-84081e5bd688') The vm start process failed (vm:940)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2832, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: Cannot get interface MTU on 'ovn_test_custom': No such device


Version-Release number of selected component (if applicable):
4.2.2.2-0.1.el7

How reproducible:
100%

Steps to Reproduce:
1. Try to start VM with ovn network on latest d/s build

Actual results:
Failed

Expected results:
Must work

Comment 1 Dan Kenigsberg 2018-03-06 12:25:37 UTC
Adding some logging to /usr/libexec/vdsm/hooks/before_device_create/ovirt_provider_ovn_hook shows that it is executed and does it job. It seems that vdsm ignores its output and uses the Engine-generated device xml instead. Thus I believe that this is a recent virt regression.

final iface <interface type="bridge"><address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/><mac address="00:00:00:00:0
0:20"/><model type="virtio"/><source bridge="br-int"/><filterref filter="vdsm-no-mac-spoofing"/><boot order="2"/><alias name="ua-77b53de8-9
7d5-4c90-96ca-6ab12b34f96f"/><virtualport type="openvswitch"><parameters interfaceid="11572e42-47fb-4372-9b3d-7a86aa21081c"/></virtualport>
</interface>

The problem does not occur on hotplug.

Comment 2 Red Hat Bugzilla Rules Engine 2018-03-06 12:25:42 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Michael Burman 2018-03-06 12:35:53 UTC
vmfex effected as well, wrong interface type is passed in the xml

Comment 5 Michael Burman 2018-03-07 09:55:55 UTC
(In reply to Francesco Romani from comment #4)
> Patch https://gerrit.ovirt.org/#/c/88547/ is supposed to fix this. RPMs:
> http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-on-demand-el7-
> x86_64/821/

Hi Francesco,
I have verified the ovn + vmfex flow with this patch(will give you +1)

Comment 6 Michael Burman 2018-03-15 09:06:25 UTC
Hi

VM with SR-IOV vNIC can't start as well, does it the same issue/bug? 

2018-03-15 10:27:05,788+0200 ERROR (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') The vm start process failed (vm:940)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2829, in _run
    dom = self._connection.defineXML(domxml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirtError: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7
2018-03-15 10:27:05,789+0200 INFO  (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') Changed state to Down: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7 (code=1) (vm:1677)

Comment 7 Francesco Romani 2018-03-15 09:17:16 UTC
(In reply to Michael Burman from comment #6)
> Hi
> 
> VM with SR-IOV vNIC can't start as well, does it the same issue/bug? 
> 
> 2018-03-15 10:27:05,788+0200 ERROR (vm/f9bd0e85) [virt.vm]
> (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') The vm start process failed
> (vm:940)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in
> _startUnderlyingVm
>     self._run()
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2829, in _run
>     dom = self._connection.defineXML(domxml)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
> line 130, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92,
> in wrapper
>     return func(inst, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in
> defineXML
>     if ret is None:raise libvirtError('virDomainDefineXML() failed',
> conn=self)
> libvirtError: XML error: non unique alias detected:
> ua-04c2decd-4e33-4023-84de-a2205c777af7
> 2018-03-15 10:27:05,789+0200 INFO  (vm/f9bd0e85) [virt.vm]
> (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') Changed state to Down: XML
> error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7
> (code=1) (vm:1677)

This is likely caused by some recent changes in Engine about management of aliases. Let's ping Arik for more insights.

Comment 8 Arik 2018-03-15 09:24:36 UTC
(In reply to Francesco Romani from comment #7)
> (In reply to Michael Burman from comment #6)
> 
> This is likely caused by some recent changes in Engine about management of
> aliases. Let's ping Arik for more insights.

Michael, can you please attach the engine and vdsm logs of this failure?

Comment 9 Michael Burman 2018-03-15 09:31:16 UTC
(In reply to Arik from comment #8)
> (In reply to Francesco Romani from comment #7)
> > (In reply to Michael Burman from comment #6)
> > 
> > This is likely caused by some recent changes in Engine about management of
> > aliases. Let's ping Arik for more insights.
> 
> Michael, can you please attach the engine and vdsm logs of this failure?

Sure thing, the versions are the latest no need to mention - 
4.2.2.2-0.1.el7
vdsm-4.20.20-1.el7ev.x86_64

Comment 10 Michael Burman 2018-03-15 09:34:07 UTC
Created attachment 1408384 [details]
sr-iov vm failed to run

Comment 11 Arik 2018-03-15 09:54:07 UTC
(In reply to Michael Burman from comment #10)
> Created attachment 1408384 [details]
> sr-iov vm failed to run

Thanks.
The XML we generate seems valid.
The fix for the recent issue with user-aliases we reported to libvirt seems specific to unplugging and then plugging a device with the same user-alias, but its investigation lead to few other changes in that area in libvirt. Can we test this flow against libvirt version that includes all those recent changes?

Comment 12 Michael Burman 2018-03-15 10:05:35 UTC
(In reply to Arik from comment #11)
> (In reply to Michael Burman from comment #10)
> > Created attachment 1408384 [details]
> > sr-iov vm failed to run
> 
> Thanks.
> The XML we generate seems valid.
> The fix for the recent issue with user-aliases we reported to libvirt seems
> specific to unplugging and then plugging a device with the same user-alias,
> but its investigation lead to few other changes in that area in libvirt. Can
> we test this flow against libvirt version that includes all those recent
> changes?

Do you say that this bug is blocked on libvirt as well(i saw Francesco uploaded patched here) ? then we need to change summary and wait for libvirt fix..I'm wondering if we need new bug for this specific issue? 
This bug and the new SR_IOV issue are not involving unplug/plug

Comment 13 Michael Burman 2018-03-15 10:24:56 UTC
Any how, even if the new libvirt fix this issue, it deserves a bug to track the issue. Reporting new bug for the SR-IOV flow.

Comment 14 Michael Burman 2018-03-15 10:29:31 UTC
Arik, the issue reproduced with new libvrit libvirt-3.9.0-14.el7.x86_64

2018-03-15 12:27:16,562+0200 ERROR (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') The vm start process failed (vm:940)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2829, in _run
    dom = self._connection.defineXML(domxml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirtError: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7
2018-03-15 12:27:16,566+0200 INFO  (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') Changed state to Down: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7 (code=1
) (vm:1677)

I have reported new bug BZ 1556828

Comment 15 Michael Burman 2018-03-18 10:00:53 UTC
Verified on - 4.2.2.4-0.1.el7 and vdsm-4.20.22-1.el7ev.x86_64
with
libvirt-client-3.9.0-14.el7.x86_64
libvirt-daemon-3.9.0-14.el7.x86_64

OVN and vfmex flows are fixed now

Comment 16 Francesco Romani 2018-03-21 08:55:19 UTC
no doc_text required

Comment 17 Sandro Bonazzola 2018-03-29 11:14:34 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.