Bug 1526133 - [SR-IOV] hot-plug of vNIC on running VM fails with VDSErrorException
Summary: [SR-IOV] hot-plug of vNIC on running VM fails with VDSErrorException
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.9.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.1
: ---
Assignee: Milan Zamazal
QA Contact: Mor
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-14 20:06 UTC by Mor
Modified: 2018-02-12 11:47 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Hot plug of a SR-IOV network device didn't work properly, the hot plugged device wasn't displayed in the UI and couldn't be hot unplugged. It has been fixed and SR-IOV network device hot plug should work properly now.
Clone Of:
Environment:
Last Closed: 2018-02-12 11:47:27 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
logs (1.64 MB, application/octet-stream)
2017-12-14 20:17 UTC, Mor
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 85484 0 master MERGED virt: Handle missing network parameter in Interface.get_metadata 2017-12-18 11:27:51 UTC
oVirt gerrit 85499 0 ovirt-4.2.0 ABANDONED virt: Handle missing network parameter in Interface.get_metadata 2017-12-20 14:30:54 UTC

Description Mor 2017-12-14 20:06:40 UTC
Description of problem:
Hotplug of vNIC on VM running on SR-IOV supported NIC fails with QEMU driver error.

NOTE: Found on Vdsm version: 2.20.9.2-1 (not the one specified on bugzilla).

Version-Release number of selected component (if applicable):
RHV Version 4.2.0.2-0.1.el7
Vdsm vdsm-4.20.9.2-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Allocate VF on SR-IOV vNIC.
2. Create network with vNIC profile only PCI passthrough checkbox ticked.
3. Run a VM.
4. Hotplug new vNIC with the network attached to it.

Actual results:
UI shows error and operation fails.

Expected results:
Should work.

Additional info:

Vdsm log:
2017-12-14 21:52:00,858+0200 ERROR (jsonrpc/6) [virt.vm] (vmId='91d39add-56d2-463f-85bc-8dc806531ab3') Hotplug failed (vm:2829)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2826, in hotplugNic
    nic.setup()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/network.py", line 345, in setup
    detach_detachable(self.hostdev)
  File "/usr/lib/python2.7/site-packages/vdsm/hostdev.py", line 595, in detach_detachable
    libvirt_device.detachFlags(None)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 5353, in detachFlags
    if ret == -1: raise libvirtError ('virNodeDeviceDetachFlags() failed')
libvirtError: Requested operation is not valid: PCI device 0000:05:10.6 is in use by driver QEMU, domain golden_env_mixed_virtio_2

engine log:
2017-12-14 21:39:35,106+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [3864387d] Failed in 'HotPlugNicVDS' method
2017-12-14 21:39:35,114+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [3864387d] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command HotPlugNicVDS failed: Requested operation is not valid: PCI device 0000:05:10.2 is in use by driver QEMU, domain golden_env_mixed_virtio_1
2017-12-14 21:39:35,114+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [3864387d] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand' return value '
VmInfoReturn:{status='Status [code=49, message=Requested operation is not valid: PCI device 0000:05:10.2 is in use by driver QEMU, domain golden_env_mixed_virtio_1]'}
'
2017-12-14 21:39:35,114+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [3864387d] HostName = host_mixed_1
2017-12-14 21:39:35,114+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [3864387d] Command 'HotPlugNicVDSCommand(HostName = host_mixed_1, VmNicDeviceVDSParameters:{hostId='6b8eb64a-1f31-4c42-bf72-5884169119e4', vm.vm_name='golden_env_mixed_virtio_2', nic='VmNetworkInterface:{id='f9199ffe-9ec1-4d66-8b5f-475bbecb4564', name='nic2', networkName='C1_sriov_vm2', vnicProfileName='null', vnicProfileId='b0f654f6-f68d-4e99-bac6-edf8a77bac92', speed='1000', type='5', macAddress='00:1a:4a:16:91:e1', active='true', linked='true', portMirroring='false', vmId='91d39add-56d2-463f-85bc-8dc806531ab3', vmName='null', vmTemplateId='null', QoSName='null', remoteNetworkName='null'}', vmDevice='VmDevice:{id='VmDeviceId:{deviceId='f9199ffe-9ec1-4d66-8b5f-475bbecb4564', vmId='91d39add-56d2-463f-85bc-8dc806531ab3'}', device='hostdev', type='INTERFACE', specParams='[]', address='', managed='true', plugged='true', readOnly='false', deviceAlias='', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='pci_0000_05_10_2'}'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotPlugNicVDS, error = Requested operation is not valid: PCI device 0000:05:10.2 is in use by driver QEMU, domain golden_env_mixed_virtio_1, code = 49
2017-12-14 21:39:35,114+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugNicVDSCommand] (default task-3) [3864387d] FINISH, HotPlugNicVDSCommand, log id: 64089121
2017-12-14 21:39:35,119+02 ERROR [org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand] (default task-3) [3864387d] Command 'org.ovirt.engine.core.bll.network.vm.ActivateDeactivateVmNicCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugNicVDS, error = Requested operation is not valid: PCI device 0000:05:10.2 is in use by driver QEMU, domain golden_env_mixed_virtio_1, code = 49 (Failed with error ACTIVATE_NIC_FAILED and code 49)
2017-12-14 21:39:35,132+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [3864387d] EVENT_ID: NETWORK_ACTIVATE_VM_INTERFACE_FAILURE(1,013), Failed to plug Network Interface nic2 (PCI Passthrough) to VM golden_env_mixed_virtio_2. (User: admin@internal-authz)

Comment 1 Mor 2017-12-14 20:17:46 UTC
Created attachment 1368141 [details]
logs

Comment 2 Dan Kenigsberg 2017-12-14 20:53:08 UTC
Milan, could you help me understand this repeating traceback? I suspect you did not test your recent changes to hotplug with sr-iov.

2017-12-14 22:01:04,117+0200 ERROR (jsonrpc/7) [virt.vm] (vmId='0fb617b0-ca53-4220-b558-2fe940ec0ab2') Error fetching vm stats (vm:1766)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1762, in _getRunningVmStats
    vm_sample.interval)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 45, in produce
    networks(vm, stats, first_sample, last_sample, interval)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 322, in networks
    if nic.name.startswith('hostdev'):
AttributeError: name

Comment 3 Red Hat Bugzilla Rules Engine 2017-12-14 20:53:15 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Milan Zamazal 2017-12-15 09:37:21 UTC
This is what happens:

- SR-IOV NIC hotplug is called.
- Such a hotplug is called without `network' parameter (network=null).
- The NIC is successfully hotplugged in libvirt.
- Metadata update follows in Vdsm, which doesn't expect that `network' is missing, resulting in AttributeError.
- NIC device instance initialization in Vdsm remains incomplete, resulting in the vmstats errors.
- Following hotplug attempts of the same NIC fail, apparently since the device is already present, but Engine doesn't know about it.

So the problem is not in my recent patches (which are not present in that version anyway), but in missing `network' parameter in metadata update. We don't know whether the parameter should be present with SR-IOV hotplug or not, depending on that the bug is either in Engine or in Vdsm metadata update.

Comment 5 Dan Kenigsberg 2017-12-15 09:49:29 UTC
Mor, can you find in which Vdsm/Engine versions were last where sr-iov hotplug has been passing?

Comment 6 Milan Zamazal 2017-12-15 09:54:37 UTC
Is it correct that `network' parameter is null on SR-IOV hotplug? If yes, we should fix metadata update in Vdsm, if no then it should be fixed in Engine and perhaps Vdsm should reject hotplug in such a case.

Comment 7 Dan Kenigsberg 2017-12-15 10:18:28 UTC
(In reply to Milan Zamazal from comment #6)
> Is it correct that `network' parameter is null on SR-IOV hotplug?

Yes, I think so. AFAIR when `hostdev` is passed, `network` does not. Too bad that this (as the `hostdev` element itself) is not documented in lib/vdsm/api/vdsm-api.yml.

> If yes, we
> should fix metadata update in Vdsm

would you, Milan?

Comment 8 Milan Zamazal 2017-12-15 10:25:25 UTC
Thank you Dan for info, I'll try to make a fix.

Comment 9 Mor 2017-12-17 07:15:20 UTC
Dan, I tried to look for success on running the affected test case in Jenkins, but I couldn't found one on the downstream RHV 4.2 build.

Comment 10 Dan Kenigsberg 2017-12-17 08:29:39 UTC
(In reply to Mor from comment #9)
> Dan, I tried to look for success on running the affected test case in
> Jenkins, but I couldn't found one on the downstream RHV 4.2 build.

Mor, I'm not sure I understand your English. Would you please install Milan's vdsm from http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-on-demand-el7-x86_64/703/ and test sriov hotplug on it?

Comment 11 Mor 2017-12-17 09:22:41 UTC
(In reply to Dan Kenigsberg from comment #10)
> (In reply to Mor from comment #9)
> > Dan, I tried to look for success on running the affected test case in
> > Jenkins, but I couldn't found one on the downstream RHV 4.2 build.
> 
> Mor, I'm not sure I understand your English. Would you please install
> Milan's vdsm from
> http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-on-demand-el7-
> x86_64/703/ and test sriov hotplug on it?

Which one should I check? There are two builds: fc26 and fcraw.

Comment 12 Dan Kenigsberg 2017-12-17 13:19:44 UTC
(In reply to Mor from comment #11)
> 
> Which one should I check? There are two builds: fc26 and fcraw.

Neither. My link leads you to an el7 build.

Anyway, we have verified the patch by applying manually.

Comment 13 Dan Kenigsberg 2017-12-17 13:21:34 UTC
hotplug sriov is an advanced feature. ykaul believes it should not block 4.2.0.

Comment 14 Mor 2018-01-07 09:55:28 UTC
Verified on RHV 4.2.1-0.2.el7

Comment 15 Sandro Bonazzola 2018-02-12 11:47:27 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.