Bug 1412074 - [SR-IOV] - same pci addr is stored for two vNICs - libvirtError: XML error: Attempted double use of PCI slot 0000:00:08.0 (may need "multifunction='on'" for device on function 0)
Summary: [SR-IOV] - same pci addr is stored for two vNICs - libvirtError: XML error: A...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.2.0
: 4.2.0
Assignee: Marcin Mirecki
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On: 1471667 1472286
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-11 07:25 UTC by Michael Burman
Modified: 2017-12-20 11:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-20 11:08:22 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
vdsm log (537.19 KB, application/x-gzip)
2017-01-11 07:25 UTC, Michael Burman
no flags Details
logs in debug (1.70 MB, application/x-gzip)
2017-02-05 12:57 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 72064 0 'None' MERGED engine: Avoid PCI address conflict when existing interface is in an ambiguous state. 2020-02-25 08:46:38 UTC

Description Michael Burman 2017-01-11 07:25:34 UTC
Created attachment 1239344 [details]
vdsm log

Description of problem:
[SR-IOV] - same pci addr is stored for two vNICs - libvirtError: XML error: Attempted double use of PCI slot 0000:00:08.0 (may need "multifunction='on'" for device on function 0)

This looks like the same bug we had on 3.6 for virtIo vNICs - BZ 1199782
But the fix wasn't done for sr-iov vNICs and we now facing the same issue.

2017-01-11 08:46:49,539+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-10) [7c2b3d00] Failed in 'HotPlugNicVDS' method
2017-01-11 08:46:49,552+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [7c2b3d00] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_
mixed_1 command failed: XML error: Attempted double use of PCI slot 0000:00:08.0 (may need "multifunction='on'" for device on function 0)
2017-01-11 08:46:49,552+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-10) [7c2b3d00] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand' return value 
'StatusOnlyReturn [status=Status [code=49, message=XML error: Attempted double use of PCI slot 0000:00:08.0 (may need "multifunction='on'" for device on function 0)]]'

2017-01-11 08:46:49,403 INFO  (jsonrpc/7) [virt.vm] (vmId='dd9b1f83-cb5d-4afe-81ea-224050a01531') Hotplug NIC xml: <?xml version='1.0' encoding='UTF-8'?>
<interface managed="no" type="hostdev">
    <address bus="0x00" domain="0x0000" function="0x0" slot="0x08" type="pci" />
    <mac address="00:1a:4a:16:20:7a" />
    <source>
        <address bus="5" domain="0" function="2" slot="16" type="pci" />
    </source>
    <driver name="vfio" />
</interface>
 (vm:2085)
2017-01-11 08:46:49,403 INFO  (jsonrpc/7) [virt.vm] (vmId='dd9b1f83-cb5d-4afe-81ea-224050a01531') Detaching device pci_0000_05_10_2 from the host. (network:214)
2017-01-11 08:46:49,457 ERROR (jsonrpc/7) [virt.vm] (vmId='dd9b1f83-cb5d-4afe-81ea-224050a01531') Hotplug failed (vm:2091)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2089, in hotplugNic
    self._dom.attachDevice(nicXml)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 536, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: XML error: Attempted double use of PCI slot 0000:00:08.0 (may need "multifunction='on'" for device on function 0)
2017-01-11 08:46:49,458 INFO  (jsonrpc/7) [vdsm.api] FINISH hotplugNic return={'status': {'message': 'XML error: Attempted double use of PCI slot 0000:00:08.0 (may need "multifunction=\'on\'" for device on functio
n 0)', 'code': 49}} (api:43)

Version-Release number of selected component (if applicable):
4.1.0-0.4.master.20170109222652.git53fd6cb.el7.centos

How reproducible:
100%

Steps to Reproduce:
1. Host with two VFs enabled
2. Start VM with one sriov vNIC
3.  Unplug the vNIC
4. Add another sriov vNIC
5. Try to Plug the first vNIC back

Actual results:
Failed
libvirtError: XML error: Attempted double use of PCI slot 0000:00:08.0 (may need "multifunction='on'" for device on function 0)

Expected results:
We should not try to use the pci addr

Comment 1 Michael Burman 2017-02-05 12:57:31 UTC
Created attachment 1247815 [details]
logs in debug

Comment 2 Marcin Mirecki 2017-02-10 09:30:30 UTC
During some debugging sessions we established that a possible cause of this bug is a delayed update of the device data in the DB by VM monitoring.
When a device is plugged, at first the "address" field is empty. It is only populated in the next VM monitoring pass (it's also quite probable that it's not caused by the delay to next VM monitoring, but a delay of reading the data on vdsm side and passing it to the result of Host.getAllVmStats).
There is therefore a time period (between update and VM monitoring that updates the device), when the "address" remains empty, even though the device already occupies a PCI slot. If we manage to plug in an unplugged device with the same PCI address during this time, we will get the described error.

========================

Notes:
- We only managed to reproduce the bug on one specific env, I did not manage to reproduce this on my environment.
- It only occured for SRIOV devices
- It took quite a lot of attempts to reproduce this (since we rely on the time gap between plug and VM monitoring)

=========================
Description of the debug session.
Steps:

Plug in NIC2.

Unplug NIC2.
The state of the DB will be (irrelevant rows removed):
engine=# select device_id,  name, is_plugged, address, from vm_device as d, vm_interface as i where d.device_id = i.id and d.type='interface' and d.device='hostdev' order by d.is_plugged, name, d.host_device;
              device_id               | name | is_plugged |                           address                            
--------------------------------------+------+------------+--------------------------------------------------------------
 508932a7-0e5f-42bd-9a10-db830b2fa804 | nic2 | f          | {slot=0x19, bus=0x00, domain=0x0000, type=pci, function=0x0} 

The address of NIC6 is {slot=0x19, bus=0x00, domain=0x0000, type=pci, function=0x0}
We will use this NIC6 to plug in and cause the conflict later.

Now plug in another nic, use virsh to check if it is using the same PCI address.
#virsh edit vm
Should show the new device xml to be:
    <interface type='hostdev'>
      <mac address='00:1a:4a:16:91:df'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x05' slot='0x10' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
    </interface>

Check DB state, it should show the address to be empty (you have to rush to be faster than VM monitoring):

              device_id               | name | is_plugged |                           address                            
--------------------------------------+------+------------+--------------------------------------------------------------
 508932a7-0e5f-42bd-9a10-db830b2fa804 | nic2 | f          | {slot=0x19, bus=0x00, domain=0x0000, type=pci, function=0x0} 
 24b6c16f-118d-4378-aed8-c263fabaa394 | nic7 | t          |  

Quickly plug in NIC2 again.

==================

Results:
----------------------------
engine.log

{"jsonrpc": "2.0", "id": "bb379e89-bce5-42e9-bdaf-c3485a8bce7c", "error": {"message": "XML error: Attempted double use of PCI slot 0000:00:19.0 (may need \"multifunction='on'\" for device on function 0)", "code": 49}}\00
2017-02-07 14:56:53,780+02 DEBUG [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Message received: {"jsonrpc": "2.0", "id": "bb379e89-bce5-42e9-bdaf-c3485a8bce7c", "error": {"message": "XML error: Attempted double use of PCI slot 0000:00:19.0 (may need \"multifunction='on'\" for device on function 0)", "code": 49}}
2017-02-07 14:56:53,781+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [4929fbc7] Failed in 'HotPlugNicVDS' method
2017-02-07 14:56:53,802+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [4929fbc7] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command HotPlugNicVDS failed: XML error: Attempted double use of PCI slot 0000:00:19.0 (may need "multifunction='on'" for device on function 0)
2017-02-07 14:56:53,802+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [4929fbc7] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand' return value 'StatusOnlyReturn [status=Status [code=49, message=XML error: Attempted double use of PCI slot 0000:00:19.0 (may need "multifunction='on'" for device on function 0)]]'
2017-02-07 14:56:53,802+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [4929fbc7] HostName = host_mixed_1
2017-02-07 14:56:53,802+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [4929fbc7] Command 'HotPlugNicVDSCommand(HostName = host_mixed_1, VmNicDeviceVDSParameters:{runAsync='true', hostId='3a461812-281d-48db-8b7a-bc7a5849c1fe', vm.vm_name='golden_env_mixed_virtio_0', nic='VmNetworkInterface:{id='508932a7-0e5f-42bd-9a10-db830b2fa804', networkName='sr', vnicProfileName='null', vnicProfileId='8e6e49d9-a52e-4ed1-b979-3286f84aab57', speed='1000', type='5', macAddress='00:1a:4a:16:91:e5', active='true', linked='true', portMirroring='false', vmId='1674025e-8abf-4b2a-b702-2731e0e88ce4', vmName='golden_env_mixed_virtio_0', vmTemplateId='null', QoSName='null', remoteNetworkName='null'}', vmDevice='VmDevice:{id='VmDeviceId:{deviceId='508932a7-0e5f-42bd-9a10-db830b2fa804', vmId='1674025e-8abf-4b2a-b702-2731e0e88ce4'}', device='hostdev', type='INTERFACE', bootOrder='0', specParams='[]', address='{slot=0x19, bus=0x00, domain=0x0000, type=pci, function=0x0}', managed='true', plugged='false', readOnly='false', deviceAlias='hostdev0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='pci_0000_05_10_0'}'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotPlugNicVDS, error = XML error: Attempted double use of PCI slot 0000:00:19.0 (may need "multifunction='on'" for device on function 0), code = 49
2017-02-07 14:56:53,803+02 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [4929fbc7] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugNicVDS, error = XML error: Attempted double use of PCI slot 0000:00:19.0 (may need "multifunction='on'" for device on function 0), code = 49
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createDefaultConcreteException(VdsBrokerCommand.java:76) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.createException(BrokerCommandBase.java:222) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:192) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand.executeVdsBrokerCommand(HotPlugNicVDSCommand.java:14) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:111) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:407) [vdsbroker.jar:]
	at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2158) [bll.jar:]
	at org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand.executePlugOrUnplug(ActivateDeactivateVmNicCommand.java:285) [bll.jar:]
	at org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand.plugNic(ActivateDeactivateVmNicCommand.java:255) [bll.jar:]
	at org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand.executeVmCommand(ActivateDeactivateVmNicCommand.java:203) [bll.jar:]
	at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:120) [bll.jar:]

--------------------------------------
vdsm.log

2017-02-07 14:56:53,254 INFO  (jsonrpc/6) [virt.vm] (vmId='1674025e-8abf-4b2a-b702-2731e0e88ce4') Hotplug NIC xml: <?xml version='1.0' encoding='UTF-8'?>
<interface managed="no" type="hostdev">
    <address bus="0x00" domain="0x0000" function="0x0" slot="0x19" type="pci" />
    <mac address="00:1a:4a:16:91:e5" />
    <source>
        <address bus="5" domain="0" function="0" slot="16" type="pci" />
    </source>
    <driver name="vfio" />
</interface>
 (vm:2086)
2017-02-07 14:56:53,254 INFO  (jsonrpc/6) [virt.vm] (vmId='1674025e-8abf-4b2a-b702-2731e0e88ce4') Detaching device pci_0000_05_10_0 from the host. (network:214)
2017-02-07 14:56:53,317 ERROR (jsonrpc/6) [virt.vm] (vmId='1674025e-8abf-4b2a-b702-2731e0e88ce4') Hotplug failed (vm:2092)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2090, in hotplugNic
    self._dom.attachDevice(nicXml)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 536, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: XML error: Attempted double use of PCI slot 0000:00:19.0 (may need "multifunction='on'" for device on function 0)
2017-02-07 14:56:53,321 INFO  (jsonrpc/6) [vdsm.api] FINISH hotplugNic return={'status': {'message': 'XML error: Attempted double use of PCI slot 0000:00:19.0 (may need "multifunction=\'on\'" for device on function 0)', 'code': 49}} (api:43)
2017-02-07 14:56:53,321 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.hotplugNic failed (error 49) in 0.20 seconds (__init__:515)

Comment 3 Dan Kenigsberg 2017-02-13 11:35:28 UTC
I don't quite like the suggested patch, even though it was approved by maintainers. It feels raceful, and can loose information (the old pci address) when it is not really necessary. Let us take it into 4.2 only.

In any case, this bug, which pops up only during quick plug/unplug scenario, is not really urgent.

Comment 4 Dan Kenigsberg 2017-07-16 15:26:40 UTC
I did not like the patch, but the issue is not important enough to keep fighting for a pure solution. Let QE judge if it is good enough for us.

Comment 5 Michael Burman 2017-07-26 10:59:47 UTC
Verified on - 4.2.0-0.0.master.20170725202023.git561151b.el7.centos and vdsm-4.20.1-241.giteb37c05.el7.centos.x86_64

Comment 6 Sandro Bonazzola 2017-12-20 11:08:22 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.