Description of problem: [SR-IOV] [i40e] SR-IOV information is missing because of 'block_path is None' We facing a known bug that was already fixed and now we see it again in our sr-iov automated tests. BZ 1812586 engine doesn't like: 'scsi_14_0_0_5': {'params': {'capability': 'scsi', 'is_assignable': 'true', 'driver': 'sd', 'parent': 'scsi_target14_0_0', 'address': {'host': '14', 'bus': '0', 'target': '0', 'lun': '5'}, 'vendor': 'NETAPP', 'product': 'LUN C-Mode', 'udev_path': '/dev/sg6', 'block_path': None}} in deviceList NOt sure why we see it again, but this is a pure regression and test blocking. pls resolve ASAP. Version-Release number of selected component (if applicable): vdsm-4.40.19-1.el8ev.x86_64 How reproducible: 100% on i40e driver HW Steps to Reproduce: 1. Run SR-IOV test on i40e host and enable VFs on the host Actual results: Operation failed and engine doesn't like: 'scsi_14_0_0_5': {'params': {'capability': 'scsi', 'is_assignable': 'true', 'driver': 'sd', 'parent': 'scsi_target14_0_0', 'address': {'host': '14', 'bus': '0', 'target': '0', 'lun': '5'}, 'vendor': 'NETAPP', 'product': 'LUN C-Mode', 'udev_path': '/dev/sg6', 'block_path': None}} in deviceList Expected results: Must work as expected. Additional info: Same bug as BZ 1812586
The following code in Vdsm if params.get('udev_path'): mapping = _get_udev_block_mapping() params['block_path'] = mapping.get(params['udev_path']) apparently doesn't expect that udev path is not present in `lsscsi -g' output. Engine doesn't like the fact that block_path is None. I guess the patches fixing https://bugzilla.redhat.com/1793550 had only disks on mind. Vdsm should probably omit block_path completely if it is not available; I also must check how such a case would be handled in Engine.
Looking into the logs, I can see there are many NETAPP devices reported, but only one of them has a non-null block path. Vdsm retrieves block_path from `lsscsi -g' output. Since one of the NETAPP devices, as well as another disk, have the block paths, lsscsi probably works normally. But those devices with non-null block paths are reported by libvirt but not by `lsscsi -g'. When looking at it on Michael's machine, all the devices reported by libvirt have the block path listed in `lsscsi -g' and without any dangerous output. Still, Michael, could you please provide Vdsm logs from the tests before "'block_path': None" first appeared? Maybe there is some error from the block_path retrieval reported there.
Logs from failed test runs that could confirm there is no error in lsscsi output parsing in Vdsm are not available and the issue can't be currently reproduced. Given the information available so far, I assume that: - There is no problem in Vdsm lsscsi parsing, since it currently works. - The mismatch between libvirt device listing and lsscsi listing is not a normal situation, since again, it currently works. I think that in case the block path is missing, Vdsm should log an error and not to report the device to Engine. Alternatively, we can do a similar thing on the Engine side. I don't know if it can cause any trouble on the Engine side but if the device is invisible then it can't be reported. That would fix the automation and if there is a problem with the workaround, it'll show there. Unless there are objections, I'll implement the change in Vdsm.
There is also https://bugzilla.redhat.com/1801206 about null block path. Since Engine is supposed to handle null block paths in that bug, it should handle them also here. I.e. not skipping such devices, but accepting them without crashing in device processing. Further handling of those VMs, i.e. preventing them from starting, is going to be solved within the cited bug.
Verified on - vdsm-4.40.22-1.el8ev.x86_64 and rhvm-4.4.1.7-0.3.el8ev.noarch
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.