Bug 1199782
Summary: | same pci addr is stored for two vNICs if they are plugged to a running VM one at a time | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Michael Burman <mburman> | ||||||||||||
Component: | ovirt-engine | Assignee: | Marcin Mirecki <mmirecki> | ||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Burman <mburman> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 3.5.1 | CC: | bazulay, danken, gklein, lpeer, lsurette, mburman, myakove, rbalakri, Rhev-m-bugs, srevivo, ykaul, ylavi | ||||||||||||
Target Milestone: | ovirt-3.6.2 | ||||||||||||||
Target Release: | 3.6.2 | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2016-04-20 01:28:43 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
The first collision I see in the libvirt log is while running the VM with two NICs sharing the same PCI address, one with an external network. Could you reproduce on a deployment with no external networks? And state exactly the steps to reproduce? Hi Lior, I wrote the exact steps above in the Description. Like i said, not related to external network. Yes i did managed to reproduce without external network. Can I see logs from such a deployment? Created attachment 999346 [details]
new logs
Yes sure Lior, logs from such deployment attached. Lior, The same for 3.6.0-0.0.master.20150307182246.git2776c91.el6 2015-03-09 09:50:28,080 ERROR [org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand] (ajp--127.0.0.1-8702-3) [65101358] Command 'org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand' failed: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugNicVDS, error = internal error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0), code = 49 (Failed with error ACTIVATE_NIC_FAILED and code 49) - No external networks. Very simple steps to reproduce. Created attachment 999442 [details]
3.6 logs
The same for 3.5.0-0.33.el6ev(ASYNC) Which OS does the guest run? (if there's no guest, or the OS does not support hot-plug, it's NOTABUG). Does the guest report the unplugged nic? What's its state? Hi Dan, rhel6.5,6.6 and rhel 7. Guest doesn't report the unplugged nic. libvirt-1.1.1-29.el7_0.7.x86_64 vdsm-4.16.8.1-7.el7ev.x86_64 vdsm-4.16.12-2.el7ev.x86_64 Created attachment 999515 [details]
New fail logs for Lior
Dan, the latest logs are with a run I conducted together with Michael, it's well controlled. We created VM lior and ran it with one vNIC, its state was dumped into lior.xml. Then we hot-unplugged nic1 (MAC *22:02), hot-plugged nic2 (MAC *22:03), and then tried to hot-plug nic1 again. Somehow nic2 got the PCI slot that had been allocated to nic1, 0x03. As far as I could see, neither engine, nor vdsm nor libvirt "asked" for that slot - so it seems to me like the guest OS (RHEL, according to Michael either 7* or 6.6) was the one who re-allocated it to nic2. Do you agree with the analysis? Who can we talk to on the platform side? nic1's address is kept allocated on Engine, but it is completely free and forgotten in libvirt once unplug has succeeded. I do not see a way to solve this on libvirt or vdsm. Engine might be able to blank-out the pci address of the unplugged nic1 if it notices that that nic is already taken by another device (but only then, since we DO like to persist the fromer address of nic1) Another option is complete control of PCI addresses in Engine, which is not a simple feature to add. Is this consistent with previous behavior?... Didn't RHEL use to not allocate the same PCI address to another network interface (unless it had no choice)? I vaguely remember this behavior from our discussions concerning vNIC ordering. (In reply to Lior Vernia from comment #15) > Is this consistent with previous behavior?... Didn't RHEL use to not > allocate the same PCI address to another network interface (unless it had no > choice)? I vaguely remember this behavior from our discussions concerning > vNIC ordering. You might be recalling RHEL *guest*'s persistence of pciaddr+mac->guest_nicname mapping, which is persisted on the guest disk in case the vNIC is plugged again. The allocation of PCI addresses happens in libvirt; I don't believe that it has never attempted to maintain a history of previously-installed devices. BTW, I am guessing that the very same bug can happen regardless of *hot*plugging: - run a VM with nic1; pci1 is persisted in engine. - stop the VM. unplug nic1. plug nic2. - run VM with nic2; pci2 is allocated by libvirt, and is most likely to equal pci1. - plug both nics and attempt to run the VM. I expect Engine is sending the same address to both nics, which breaks in libvirt. is that the case, Michael? Yes, that is the case. Failed run VM. libvirt.log: XML error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0) <interface type="bridge"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/> <mac address="00:1a:4a:16:88:5c"/> <model type="virtio"/> <source bridge="rhevm"/> <link state="up"/> <bandwidth/> </interface> <interface type="bridge"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/> <mac address="00:1a:4a:16:88:5e"/> <model type="virtio"/> <source bridge="rhevm"/> <link state="up"/> <bandwidth/> </interface> If we indeed do not want the engine to stop caring about previous PCI addresses, then this should probably be solved by exposing vNIC PCI address management to users - which seems like the right way to go for Bug 1108926 as well. Lowering priority as there's an easy workaround - just remove the vNIC and re-create it. Tested and failedQA on 3.6.1.2-0.1.el6 and vdsm-4.17.13-1.el7ev.noarch Thread-356::ERROR::2015-12-10 12:32:59,754::vm::758::virt.vm::(_startUnderlyingVm) vmId=`404f96db-b224-4163-a21e-eeb8eb084d7b`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 702, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 1889, in _run self._connection.createXML(domxml, flags), File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3611, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: XML error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0) I tested my steps from description ^^ 1. Run VM and add vNIC with 'rhevm' profile 2. HotUnplug vNIC 3. Add new vNIC to VM with 'rhevm' profile 4. Try to HotPlug back the first vNIC and some how succeeded. I tested Dan's steps from comment 16^^ - run a VM with nic1; pci1 is persisted in engine. - stop the VM. unplug nic1. plug nic2. - run VM with nic2; pci2 is allocated by libvirt, and is most likely to equal pci1. - plug both nics and attempt to run the VM. I expect Engine is sending the same address to both nics, which breaks in libvirt. And Failed with libvirtError. same as the original report. Created attachment 1104298 [details]
vdsm log
The fix is only for hotplugging/hotunplugging. The stop/start would require another patch. We can wait for this patch to 3.6.2. I think the problem can touch not only the nics, but also other pci devices (like discs). Verified on - 3.6.2-0.1.el6 |
Created attachment 999287 [details] Logs Description of problem: Can't HotPlug vNIC- Error while executing action Edit VM Interface properties: Failed to activate VM Network Interface cause of libvirtError: internal error: Attempted double use of PCI slot. libvirt.log: <interface type="bridge"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/> <mac address="00:1a:4a:16:88:5f"/> <model type="virtio"/> <source bridge="rhevm"/> <link state="up"/> <boot order="2"/> <bandwidth/> </interface> <interface type="bridge"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x08" type="pci"/> <mac address="00:1a:4a:16:88:60"/> <model type="virtio"/> <source bridge="qbrb966e777-a4"/> <link state="up"/> <boot order="3"/> <bandwidth/> <target dev="tapb966e777-a4"/> </interface> <interface type="bridge"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/> <mac address="00:1a:4a:16:88:61"/> <model type="virtio"/> <source bridge="br-int"/> <link state="up"/> <boot order="4"/> <bandwidth/> <virtualport type="openvswitch"> <parameters interfaceid="8be1902c-1eb3-4001-b28b-0044c4bd3773"/> </virtualport> vdsm.log: Thread-1138::ERROR::2015-03-08 11:43:30,806::vm::3421::vm.Vm::(hotplugNic) vmId=`038dd653-dc16-48df-a06b-40338a7c98f3`::Hotplug failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3419, in hotplugNic self._dom.attachDevice(nicXml) File "/usr/share/vdsm/virt/vm.py", line 689, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 419, in attachDevice if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) libvirtError: internal error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0) Version-Release number of selected component (if applicable): 3.5.1-0.1.el6ev How reproducible: 100 Steps to Reproduce: 1. Run VM and add vNIC with 'rhevm' profile 2. HotUnplug vNIC 3. Add new vNIC to VM with 'rhevm' profile 4. Try to HotPlug back the first vNIC Actual results: Fail with error: Error while executing action Edit VM Interface properties: Failed to activate VM Network Interface. Expected results: Operation should succeed.