Bug 1849275 - [SR-IOV] [i40e] SR-IOV information is missing because of 'block_path is None'
Summary: [SR-IOV] [i40e] SR-IOV information is missing because of 'block_path is None'
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.40.19
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.1
: 4.40.21
Assignee: Milan Zamazal
QA Contact: msheena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-20 08:31 UTC by Michael Burman
Modified: 2020-08-17 20:40 UTC (History)
7 users (show)

Fixed In Version: vdsm-4.40.21
Doc Type: Bug Fix
Doc Text:
Previously, if the block path was unavailable for a storage block device on a host, the RHV Manager could not process host devices from that host. The current release fixes this issue. The Manager can process host devices even though a block path is missing.
Clone Of:
Environment:
Last Closed: 2020-07-08 08:25:08 UTC
oVirt Team: Virt
Embargoed:
sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack?
mburman: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 109897 0 master MERGED hostdev: Omit block_path when it's empty 2020-08-24 13:53:37 UTC

Description Michael Burman 2020-06-20 08:31:09 UTC
Description of problem:
[SR-IOV] [i40e] SR-IOV information is missing because of 'block_path is None'

We facing a known bug that was already fixed and now we see it again in our sr-iov automated tests. BZ 1812586

engine doesn't like:
'scsi_14_0_0_5': {'params': {'capability': 'scsi', 'is_assignable': 'true', 'driver': 'sd', 'parent': 'scsi_target14_0_0', 'address': {'host': '14', 'bus': '0', 'target': '0', 'lun': '5'}, 'vendor': 'NETAPP', 'product': 'LUN C-Mode', 'udev_path': '/dev/sg6', 'block_path': None}}
in deviceList

NOt sure why we see it again, but this is a pure regression and test blocking.
pls resolve ASAP.

Version-Release number of selected component (if applicable):
vdsm-4.40.19-1.el8ev.x86_64

How reproducible:
100% on i40e driver HW

Steps to Reproduce:
1. Run SR-IOV test on i40e host and enable VFs on the host

Actual results:
Operation failed and engine doesn't like:
'scsi_14_0_0_5': {'params': {'capability': 'scsi', 'is_assignable': 'true', 'driver': 'sd', 'parent': 'scsi_target14_0_0', 'address': {'host': '14', 'bus': '0', 'target': '0', 'lun': '5'}, 'vendor': 'NETAPP', 'product': 'LUN C-Mode', 'udev_path': '/dev/sg6', 'block_path': None}}
in deviceList

Expected results:
Must work as expected.

Additional info:
Same bug as BZ 1812586

Comment 2 Milan Zamazal 2020-06-22 06:46:16 UTC
The following code in Vdsm

    if params.get('udev_path'):
        mapping = _get_udev_block_mapping()
        params['block_path'] = mapping.get(params['udev_path'])

apparently doesn't expect that udev path is not present in `lsscsi -g' output. Engine doesn't like the fact that block_path is None. I guess the patches fixing https://bugzilla.redhat.com/1793550 had only disks on mind. Vdsm should probably omit block_path completely if it is not available; I also must check how such a case would be handled in Engine.

Comment 3 Milan Zamazal 2020-06-22 16:12:54 UTC
Looking into the logs, I can see there are many NETAPP devices reported, but only one of them has a non-null block path. Vdsm retrieves block_path from `lsscsi -g' output. Since one of the NETAPP devices, as well as another disk, have the block paths, lsscsi probably works normally. But those devices with non-null block paths are reported by libvirt but not by `lsscsi -g'.

When looking at it on Michael's machine, all the devices reported by libvirt have the block path listed in `lsscsi -g' and without any dangerous output. Still, Michael, could you please provide Vdsm logs from the tests before "'block_path': None" first appeared? Maybe there is some error from the block_path retrieval reported there.

Comment 6 Milan Zamazal 2020-06-23 07:34:02 UTC
Logs from failed test runs that could confirm there is no error in lsscsi output parsing in Vdsm are not available and the issue can't be currently reproduced. Given the information available so far, I assume that:

- There is no problem in Vdsm lsscsi parsing, since it currently works.
- The mismatch between libvirt device listing and lsscsi listing is not a normal situation, since again, it currently works.

I think that in case the block path is missing, Vdsm should log an error and not to report the device to Engine. Alternatively, we can do a similar thing on the Engine side. I don't know if it can cause any trouble on the Engine side but if the device is invisible then it can't be reported. That would fix the automation and if there is a problem with the workaround, it'll show there.

Unless there are objections, I'll implement the change in Vdsm.

Comment 7 Milan Zamazal 2020-06-23 15:50:28 UTC
There is also https://bugzilla.redhat.com/1801206 about null block path. Since Engine is supposed to handle null block paths in that bug, it should handle them also here. I.e. not skipping such devices, but accepting them without crashing in device processing. Further handling of those VMs, i.e. preventing them from starting, is going to be solved within the cited bug.

Comment 8 Michael Burman 2020-07-07 10:43:40 UTC
Verified on - vdsm-4.40.22-1.el8ev.x86_64 and rhvm-4.4.1.7-0.3.el8ev.noarch

Comment 9 Sandro Bonazzola 2020-07-08 08:25:08 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.