Bug 1303825

Summary: Engine should block run of VM with devices that are PFs with enabled VFs
Product: [oVirt] ovirt-engine Reporter: Michael Burman <mburman>
Component: BLL.VirtAssignee: bugs <bugs>
Status: CLOSED WONTFIX QA Contact: Nisim Simsolo <nsimsolo>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.2.6CC: bugs, mavital
Target Milestone: ---Flags: sbonazzo: ovirt-4.3-
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-03 12:49:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm logs none

Description Michael Burman 2016-02-02 06:44:30 UTC
Created attachment 1120332 [details]
vdsm logs

Description of problem:
Engine should block run of VM with devices that are PFs with enabled VFs.

We shouldn't allow the run of VM with directly attached PFs as devices(via Host devices sub tab) if they have enabled VFs in order to prevent some undesired scenarios/behaviors.

1) A case in which 1 or more of the VFs of the PF are in use.
In such case we can found our self trying to shut down the VM that using one of the VFs and we fail, because the node device(the VF) can't be found no more. 

"VDSM puma22.scl.lab.tlv.redhat.com command failed: Node device not found: no node device with matching name 'pci_0000_05_10_4'"
"Failed to power off VM sr-vm1 (Host: puma22.scl.lab.tlv.redhat.com, User: admin@internal)."

2) A case in which running a VM with a PF as directly attached device and one of his VFs as a 'pci-passthorugh' vNIC. Such scenario will cause to an endless loop of Node device not found(the VF of the PF disappeared and can't be found).

"VDSM puma22.scl.lab.tlv.redhat.com command failed: Node device not found: no node device with matching name 'pci_0000_05_10_4'"

This will trigger device not found errors in the logs every few seconds, until the next reboot of the server(shutting down the VM won't help)

3) There is no real use case that admin/user would enable VFs on a PF and then will run a VM with the PF as a directly attached device. 

Version-Release number of selected component (if applicable):
3.6.3-0.1


Additional info:
jsonrpc.Executor/5::ERROR::2016-02-02 08:27:33,238::__init__::526::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 521, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 277, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 346, in destroy
    res = v.destroy()
  File "/usr/share/vdsm/virt/vm.py", line 3887, in destroy
    result = self.doDestroy()
  File "/usr/share/vdsm/virt/vm.py", line 3905, in doDestroy
    return self.releaseVm()
  File "/usr/share/vdsm/virt/vm.py", line 3832, in releaseVm
    self._cleanup()
  File "/usr/share/vdsm/virt/vm.py", line 1694, in _cleanup
    self._reattachHostDevices()
  File "/usr/share/vdsm/virt/vm.py", line 1301, in _reattachHostDevices
    hostdev.reattach_detachable(dev_name)
  File "/usr/share/vdsm/hostdev.py", line 212, in reattach_detachable
    libvirt_device, device_params = _get_device_ref_and_params(device_name)
  File "/usr/share/vdsm/hostdev.py", line 151, in _get_device_ref_and_params
    nodeDeviceLookupByName(device_name)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 4182, in nodeDeviceLookupByName
    if ret is None:raise libvirtError('virNodeDeviceLookupByName() failed', conn=self)
libvirtError: Node device not found: no node device with matching name 'pci_0000_05_10_4'


2016-02-02 08:24:40,869 ERROR [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (org.ovirt.thread.pool-6-thread-50) [7b6f76d2] VDS::destroy Failed destroying VM '75211a37-8ee4-45c8-a8e0-049f13ed29e4' in vds = 
'75694d24-1833-4864-8736-fd80bffb30f7' , error = 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Node device not found: no node d
evice with matching name 'pci_0000_05_10_4', code = -32603'

Comment 1 Michal Skrivanek 2016-02-03 12:52:31 UTC
let's see if we can do something in 4.0 timeframe. It's tricky.

Comment 2 Michal Skrivanek 2016-04-22 14:48:20 UTC
pushed out due to capacity reasons

Comment 3 Michal Skrivanek 2016-12-21 09:08:11 UTC
The bug was not addressed in time for 4.1. Postponing to 4.2

Comment 4 Michal Skrivanek 2017-08-22 07:52:20 UTC
The bug was not addressed in time for 4.2. Postponing to 4.3

Comment 5 Ryan Barry 2019-01-03 12:49:38 UTC
This will not be addressed in a reasonable timeframe. Please re-open if it's still important.