Bug 1199782 - same pci addr is stored for two vNICs if they are plugged to a running VM one at a time
Summary: same pci addr is stored for two vNICs if they are plugged to a running VM one...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-3.6.2
: 3.6.2
Assignee: Marcin Mirecki
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-08 09:53 UTC by Michael Burman
Modified: 2016-04-20 01:28 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-20 01:28:43 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs (461.80 KB, application/x-gzip)
2015-03-08 09:53 UTC, Michael Burman
no flags Details
new logs (686.78 KB, application/x-gzip)
2015-03-08 15:13 UTC, Michael Burman
no flags Details
3.6 logs (1.06 MB, application/x-gzip)
2015-03-09 07:58 UTC, Michael Burman
no flags Details
New fail logs for Lior (1.15 MB, application/x-gzip)
2015-03-09 12:41 UTC, Michael Burman
no flags Details
vdsm log (560.09 KB, application/x-gzip)
2015-12-10 10:49 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 46973 0 'None' ABANDONED engine: same pci addr is stored for two vNICs 2020-04-29 11:19:07 UTC
oVirt gerrit 48342 0 'None' MERGED engine: check if pci slot is used by managed device during vm nic hotplug 2020-04-29 11:19:07 UTC
oVirt gerrit 48473 0 'None' MERGED vm: check operation result for vm nic hotunplug 2020-04-29 11:19:08 UTC
oVirt gerrit 49455 0 'None' MERGED engine: check if pci slot is used by managed device during vm nic hotplug 2020-04-29 11:19:08 UTC
oVirt gerrit 49638 0 'None' MERGED engine: check if pci slot is used by managed device during vm nic hotplug 2020-04-29 11:19:08 UTC
oVirt gerrit 49669 0 'None' MERGED vm: check operation result for vm nic hotunplug 2020-04-29 11:19:08 UTC
oVirt gerrit 50387 0 'None' MERGED engine: check hotplugged nics for duplicated pci address also when vm is not running 2020-04-29 11:19:09 UTC
oVirt gerrit 50403 0 'None' ABANDONED engine: Check vm devices for duplicated pci slots before vm startup 2020-04-29 11:19:09 UTC
oVirt gerrit 50812 0 'None' MERGED engine: check hotplugged nics for duplicated pci address also when vm is not running 2020-04-29 11:19:10 UTC

Description Michael Burman 2015-03-08 09:53:12 UTC
Created attachment 999287 [details]
Logs

Description of problem:
Can't HotPlug vNIC- Error while executing action Edit VM Interface properties: Failed to activate VM Network Interface cause of libvirtError: internal error: Attempted double use of PCI slot.

libvirt.log:

 <interface type="bridge">
                        <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/>
                        <mac address="00:1a:4a:16:88:5f"/>
                        <model type="virtio"/>
                        <source bridge="rhevm"/>
                        <link state="up"/>
                        <boot order="2"/>
                        <bandwidth/>
                </interface>
                <interface type="bridge">
                        <address bus="0x00" domain="0x0000" function="0x0" slot="0x08" type="pci"/>
                        <mac address="00:1a:4a:16:88:60"/>
                        <model type="virtio"/>
                        <source bridge="qbrb966e777-a4"/>
                        <link state="up"/>
                        <boot order="3"/>
                        <bandwidth/>
                        <target dev="tapb966e777-a4"/>
                </interface>
                <interface type="bridge">
                        <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/>
                        <mac address="00:1a:4a:16:88:61"/>
                        <model type="virtio"/>
                        <source bridge="br-int"/>
                        <link state="up"/>
                        <boot order="4"/>
                        <bandwidth/>
                        <virtualport type="openvswitch">
                                <parameters interfaceid="8be1902c-1eb3-4001-b28b-0044c4bd3773"/>
                        </virtualport>


vdsm.log:

Thread-1138::ERROR::2015-03-08 11:43:30,806::vm::3421::vm.Vm::(hotplugNic) vmId=`038dd653-dc16-48df-a06b-40338a7c98f3`::Hotplug failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 3419, in hotplugNic
    self._dom.attachDevice(nicXml)
  File "/usr/share/vdsm/virt/vm.py", line 689, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 419, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: internal error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0)


  
Version-Release number of selected component (if applicable):
3.5.1-0.1.el6ev

How reproducible:
100

Steps to Reproduce:
1. Run VM and add vNIC with 'rhevm' profile
2. HotUnplug vNIC 
3. Add new vNIC to VM with 'rhevm' profile
4. Try to HotPlug back the first vNIC 

Actual results:
Fail with error:
Error while executing action Edit VM Interface properties: Failed to activate VM Network Interface.

Expected results:
Operation should succeed.

Comment 1 Lior Vernia 2015-03-08 14:01:55 UTC
The first collision I see in the libvirt log is while running the VM with two NICs sharing the same PCI address, one with an external network.

Could you reproduce on a deployment with no external networks? And state exactly the steps to reproduce?

Comment 2 Michael Burman 2015-03-08 14:11:36 UTC
Hi Lior,

I wrote the exact steps above in the Description.

Like i said, not related to external network. Yes i did managed to reproduce without external network.

Comment 3 Lior Vernia 2015-03-08 14:42:10 UTC
Can I see logs from such a deployment?

Comment 4 Michael Burman 2015-03-08 15:13:11 UTC
Created attachment 999346 [details]
new logs

Comment 5 Michael Burman 2015-03-08 15:14:00 UTC
Yes sure Lior, logs from such deployment attached.

Comment 6 Michael Burman 2015-03-09 07:53:38 UTC
Lior,

The same for 3.6.0-0.0.master.20150307182246.git2776c91.el6

2015-03-09 09:50:28,080 ERROR [org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand] (ajp--127.0.0.1-8702-3) [65101358] Command 'org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand' failed: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugNicVDS, error = internal error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0), code = 49 (Failed with error ACTIVATE_NIC_FAILED and code 49)

- No external networks. Very simple steps to reproduce.

Comment 7 Michael Burman 2015-03-09 07:58:27 UTC
Created attachment 999442 [details]
3.6 logs

Comment 8 Michael Burman 2015-03-09 08:46:22 UTC
The same for 3.5.0-0.33.el6ev(ASYNC)

Comment 9 Dan Kenigsberg 2015-03-09 10:52:55 UTC
Which OS does the guest run? (if there's no guest, or the OS does not support hot-plug, it's NOTABUG).

Does the guest report the unplugged nic? What's its state?

Comment 10 Michael Burman 2015-03-09 11:56:51 UTC
Hi Dan,

rhel6.5,6.6 and rhel 7.

Guest doesn't report the unplugged nic.

Comment 11 Michael Burman 2015-03-09 12:03:31 UTC
libvirt-1.1.1-29.el7_0.7.x86_64
vdsm-4.16.8.1-7.el7ev.x86_64
vdsm-4.16.12-2.el7ev.x86_64

Comment 12 Michael Burman 2015-03-09 12:41:55 UTC
Created attachment 999515 [details]
New fail logs for Lior

Comment 13 Lior Vernia 2015-03-09 12:47:36 UTC
Dan, the latest logs are with a run I conducted together with Michael, it's well controlled. We created VM lior and ran it with one vNIC, its state was dumped into lior.xml. Then we hot-unplugged nic1 (MAC *22:02), hot-plugged nic2 (MAC *22:03), and then tried to hot-plug nic1 again.

Somehow nic2 got the PCI slot that had been allocated to nic1, 0x03. As far as I could see, neither engine, nor vdsm nor libvirt "asked" for that slot - so it seems to me like the guest OS (RHEL, according to Michael either 7* or 6.6) was the one who re-allocated it to nic2.

Do you agree with the analysis? Who can we talk to on the platform side?

Comment 14 Dan Kenigsberg 2015-03-09 13:04:58 UTC
nic1's address is kept allocated on Engine, but it is completely free and forgotten in libvirt once unplug has succeeded.

I do not see a way to solve this on libvirt or vdsm. Engine might be able to blank-out the pci address of the unplugged nic1 if it notices that that nic is already taken by another device (but only then, since we DO like to persist the fromer address of nic1)

Another option is complete control of PCI addresses in Engine, which is not a simple feature to add.

Comment 15 Lior Vernia 2015-03-09 13:13:46 UTC
Is this consistent with previous behavior?... Didn't RHEL use to not allocate the same PCI address to another network interface (unless it had no choice)? I vaguely remember this behavior from our discussions concerning vNIC ordering.

Comment 16 Dan Kenigsberg 2015-03-09 14:39:28 UTC
(In reply to Lior Vernia from comment #15)
> Is this consistent with previous behavior?... Didn't RHEL use to not
> allocate the same PCI address to another network interface (unless it had no
> choice)? I vaguely remember this behavior from our discussions concerning
> vNIC ordering.

You might be recalling RHEL *guest*'s persistence of pciaddr+mac->guest_nicname mapping, which is persisted on the guest disk in case the vNIC is plugged again.

The allocation of PCI addresses happens in libvirt; I don't believe that it has never attempted to maintain a history of previously-installed devices.

BTW, I am guessing that the very same bug can happen regardless of *hot*plugging:
- run a VM with nic1; pci1 is persisted in engine.
- stop the VM. unplug nic1. plug nic2.
- run VM with nic2; pci2 is allocated by libvirt, and is most likely to equal pci1.
- plug both nics and attempt to run the VM. I expect Engine is sending the same address to both nics, which breaks in libvirt.

is that the case, Michael?

Comment 17 Michael Burman 2015-03-09 15:16:19 UTC
Yes, that is the case. Failed run VM.

libvirt.log:
XML error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0)

<interface type="bridge">
                        <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/>
                        <mac address="00:1a:4a:16:88:5c"/>
                        <model type="virtio"/>
                        <source bridge="rhevm"/>
                        <link state="up"/>
                        <bandwidth/>
                </interface>
                <interface type="bridge">
                        <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/>
                        <mac address="00:1a:4a:16:88:5e"/>
                        <model type="virtio"/>
                        <source bridge="rhevm"/>
                        <link state="up"/>
                        <bandwidth/>
                </interface>

Comment 18 Lior Vernia 2015-03-09 15:27:31 UTC
If we indeed do not want the engine to stop caring about previous PCI addresses, then this should probably be solved by exposing vNIC PCI address management to users - which seems like the right way to go for Bug 1108926 as well.

Comment 19 Lior Vernia 2015-03-10 13:11:00 UTC
Lowering priority as there's an easy workaround - just remove the vNIC and re-create it.

Comment 21 Michael Burman 2015-12-10 10:40:11 UTC
Tested and failedQA on 3.6.1.2-0.1.el6 and vdsm-4.17.13-1.el7ev.noarch

Thread-356::ERROR::2015-12-10 12:32:59,754::vm::758::virt.vm::(_startUnderlyingVm) vmId=`404f96db-b224-4163-a21e-eeb8eb084d7b`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 702, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1889, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3611, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: XML error: Attempted double use of PCI slot 0000:00:03.0 (may need "multifunction='on'" for device on function 0)


I tested my steps from description ^^ 
1. Run VM and add vNIC with 'rhevm' profile
2. HotUnplug vNIC 
3. Add new vNIC to VM with 'rhevm' profile
4. Try to HotPlug back the first vNIC 
and some how succeeded.

I tested Dan's steps from comment 16^^
- run a VM with nic1; pci1 is persisted in engine.
- stop the VM. unplug nic1. plug nic2.
- run VM with nic2; pci2 is allocated by libvirt, and is most likely to equal pci1.
- plug both nics and attempt to run the VM. I expect Engine is sending the same address to both nics, which breaks in libvirt.
And Failed with libvirtError. same as the original report.

Comment 22 Michael Burman 2015-12-10 10:49:35 UTC
Created attachment 1104298 [details]
vdsm log

Comment 23 Marcin Mirecki 2015-12-10 11:13:01 UTC
The fix is only for hotplugging/hotunplugging.
The stop/start would require another patch.

Comment 24 Dan Kenigsberg 2015-12-10 12:59:26 UTC
We can wait for this patch to 3.6.2.

Comment 25 Marcin Mirecki 2015-12-11 08:44:12 UTC
I think the problem can touch not only the nics, but also other pci devices (like discs).

Comment 26 Michael Burman 2015-12-28 12:55:36 UTC
Verified on - 3.6.2-0.1.el6


Note You need to log in before you can comment on or make changes to this bug.