Bug 1428692 - hostdevListByCaps will fail if some device disappears during its runtime
Summary: hostdevListByCaps will fail if some device disappears during its runtime
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.1.2
: 4.19.11
Assignee: Martin Polednik
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-03 07:33 UTC by Martin Polednik
Modified: 2017-10-05 16:16 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-23 08:14:35 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 73460 0 master MERGED hostdev: when calculating hash, skip over devices are gone 2020-03-17 09:45:13 UTC
oVirt gerrit 73463 0 master MERGED hostdev: don't process devices that are gone 2020-03-17 09:45:13 UTC
oVirt gerrit 75038 0 ovirt-4.1 MERGED hostdev: when calculating hash, skip over devices are gone 2020-03-17 09:45:13 UTC
oVirt gerrit 75039 0 ovirt-4.1 MERGED hostdev: don't process devices that are gone 2020-03-17 09:45:13 UTC

Description Martin Polednik 2017-03-03 07:33:44 UTC
Description of problem:
VDSM does not properly handle disappearing devices.

How reproducible:
Race, "possibly".

Steps to Reproduce:
1. hostdevListByCaps
2. before 1. finishes, detach a NIC from the host
3. (race) you may see or not see the failure depending on whether we already processed the device

Actual results:
2017-03-01 10:55:20,038 ERROR (jsonrpc/0) [jsonrpc.JsonRpcServer] Internal server error (__init__:552)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in _handle_request
    res = method(**params)
  File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 1415, in hostdevListByCaps
    devices = hostdev.list_by_caps(caps)
  File "/usr/lib/python2.7/site-packages/vdsm/hostdev.py", line 478, in list_by_caps
    libvirt_devices = _get_devices_from_libvirt(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/hostdev.py", line 447, in _get_devices_from_libvirt
    __device_tree_hash(libvirt_devices) == _last_alldevices_hash):
  File "/usr/lib/python2.7/site-packages/vdsm/hostdev.py", line 129, in __device_tree_hash
    current_hash.update(device.XMLDesc(0))
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 5426, in
XMLDesc
    if ret is None: raise libvirtError ('virNodeDeviceGetXMLDesc() failed')
libvirtError: Node device not found: no node device with matching name
'net_vnet45_fe_1a_4a_23_14_af'

Expected results:
hostdevListByCaps output without the devices that are gone

Additional info:
Network devices are the ones most likely do be disappearing (e.g. detach of net's parent causes net to disappear).

Comment 1 Nisim Simsolo 2017-05-04 10:31:38 UTC
Verification build:
ovirt-engine-4.1.2-0.1.el7
vdsm-4.19.11-1.el7ev.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
libvirt-client-2.0.0-10.el7_3.5.x86_64
sanlock-3.4.0-1.el7.x86_64

Verification scenario:
1. hostdevListByCaps
2. before 1. finishes, detach a NIC from the host using virsh nodedev-detach
3. Verify NIC detached is not listed. Open cdsm.log and verify no ERRORs related to this action.


Note You need to log in before you can comment on or make changes to this bug.