Bug 2076304

Summary: VFIO refresh to v5.18
Product: Red Hat Enterprise Linux 9 Reporter: Alex Williamson <alex.williamson>
Component: kernelAssignee: Alex Williamson <alex.williamson>
kernel sub component: KVM QA Contact: Yanghang Liu <yanghliu>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: medium CC: coli, jinzhao, juzhang, nilal, virt-maint, yanghliu, zhguo
Version: 9.1Keywords: Triaged
Target Milestone: rc   
Target Release: 9.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-5.14.0-96.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-15 11:02:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2076676, 2077294    

Description Alex Williamson 2022-04-18 16:13:33 UTC
Description of problem:

Refresh vfio to v5.18

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Yanghang Liu 2022-04-21 14:31:43 UTC
Hi Alex,


May I ask if you have any additional suggestion for testing this bug ?


Is doing regression test enough for verifying this bug ?

Comment 4 Alex Williamson 2022-04-25 15:09:37 UTC
(In reply to Yanghang Liu from comment #3)
> Hi Alex,
> 
> 
> May I ask if you have any additional suggestion for testing this bug ?
> 
> 
> Is doing regression test enough for verifying this bug ?

Yes, regression testing should be used.  Ideally we'd be able to testing vGPU support, particularly SR-IOV backed vGPU, but we'll likely need a new driver drop from NVIDIA for that as there are some differences here that break the NVIDIA GRID driver build.  I can provide a hacked driver for that, but obviously we need to wait for NVIDIA for an official build.  NIC support, PF & VF, NVMe, and direct GPU assignment, etc should all work as previously.  Thanks

Comment 10 Yanghang Liu 2022-05-09 04:00:47 UTC
Pre-verify Test 

Test Env:
  host:
      5.14.0-78.mr701_220418_1703.el9.x86_64
      qemu-kvm-7.0.0-1.el9.x86_64
  guest:
      5.14.0-78.mr701_220418_1703.el9.x86_64
      Win11/Win2022

Test Case:
  RHEL7-11384	[SR-IOV] Start a vm with a  VF -- PASS
  RHEL7-11388	[SR-IOV] Shutdown/Reboot a vm with a VF -- PASS
  RHEL7-11396	[SR-IOV] Hot-unplug a VF from a vm -- PASS
  RHEL7-11399	[SR-IOV] Hot-plug a VF into a vm --  A existed bug
  RHEL7-11408	[SR-IOV] Start two virtual machines, both of which have multiple VF(s) -- PASS
  RHEL7-11409	[SR-IOV] Start a vm with multiple VF(s) -- PASS
  RHEL7-11410	[SR-IOV] Start a vm with a PF and multiple VF(s) -- PASS
  RHEL7-11373	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Shutdown a vm with multiple VF(s)  -- PASS
  RHEL7-11374	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Reboot a vm with multiple VF(s) -- PASS
  RHEL7-11375	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Hot plug multiple VF(s) into a vm --  A existed bug
  RHEL7-11376	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Hot unplug multiple VF(s) from a vm -- PASS
  RHEL7-11377	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Start a vm with multiple VF(s) -- PASS
  RHEL7-11407	[SR-IOV] Start a vm with multifunction=on VF(s) -- PASS
  RHEL7-11395	[SR-IOV] Hot-unplug multiple  VF from a vm -- PASS
  RHEL7-11397	[SR-IOV] Hot-plug multiple VF(s) into a vm -- A existed bug
  RHEL7-11428	[vfio-pf] PF sanity test -- PASS
  RHEL7-11430	[vfio-pf] Shutdown a vm with a PF -- PASS
  RHEL7-11431	[vfio-pf] Reboot a VF with a PF -- PASS
  RHEL7-11434	[vfio] Hot unplug a PF from a vm -- PASS
  RHEL7-11435	[vfio] Hot plug a PF into a vm --  A existed bug
  RHEL7-11413	[vfio] Start a vm with a PF which has a specified pci address -- PASS
  RHEL7-11427	[vfio] Release the PF from vm and rebind  PF back to host -- PASS
  RHEL7-11439	[vfio] Start a vm with multiple PFs -- PASS
  RHEL7-11440	[vfio] Start a vm with  virtual network interface(s) and a PF -- PASS
  RHEL7-11441	[vfio] Hot unplug multiple PFs from a vm -- PASS
  RHEL7-11442	[vfio] Start a vm with  multifunction=on PFs -- PASS
  RHEL7-11444	[vfio] [2M/1G(x86) 16M(ppc) hugepage] basic test -- PASS
  RHEL7-110508	[vfio] Hot unplug and hot plug a PF after a vhost=on virtio nic is unpluged -- A existed bug



The hot-plug PF/VF issue is tracked by the following two bugs:
Bug 2055123 - [Q35] Failed to hot-plug a device whose membar > 2M into the vm
Bug 2024818 - [Windows_vm][Q35+ OVMF] Some hot-plugged PF/VF can not find enough free resources that it can use

Comment 11 Yanghang Liu 2022-05-17 05:43:28 UTC
Hi Zhiyi,


Could you please help update the test result for your part so that we can pre-verify this bug ?

Comment 12 Guo, Zhiyi 2022-05-19 01:17:59 UTC
(In reply to Yanghang Liu from comment #11)
> Hi Zhiyi,
> 
> 
> Could you please help update the test result for your part so that we can
> pre-verify this bug ?

No vfio related issues found in my regression test.

Env used:
Test Env:
  host:
      5.14.0-78.mr701_220418_1703.el9.x86_64
      qemu-kvm-7.0.0-1.el9.x86_64
  guest:
      kernel-5.14.0-92.el9.x86_64
      Win10

Devices tested for GPU passthrough:
1x Nvidia A100
2x Nvidia 16

NVIDIA Driver used: 512.59/GRID 14.1 RC(512.78)

Devices tested for vGPU:
1x GRID A100-40C
8x NVIDIA A16-2Q
8x GRID RTX6000-3Q
4x i915-GVTg_V5_4

NVIDIA Driver used: GRID 14.1, host driver 510.75, VM driver 512.78;(patched GRID 14.0 also tested)

Comment 13 Yanghang Liu 2022-05-19 01:45:28 UTC
Pre-verify this bug based on the Comment 10 and Comment 12.

Comment 17 Yanghang Liu 2022-05-24 04:08:21 UTC
(In reply to Yanghang Liu from comment #10)
The Regression Test Result for verifying this bug: PASS

Test Env:
host:
5.14.0-96.el9.x86_64
qemu-kvm-7.0.0-4.el9.x86_64
guest:
5.14.0-96.el9.x86_64



> Test Case:
>   RHEL7-11384	[SR-IOV] Start a vm with a  VF -- PASS
>   RHEL7-11388	[SR-IOV] Shutdown/Reboot a vm with a VF -- PASS
>   RHEL7-11396	[SR-IOV] Hot-unplug a VF from a vm -- PASS
>   RHEL7-11399	[SR-IOV] Hot-plug a VF into a vm --  A existed bug
>   RHEL7-11408	[SR-IOV] Start two virtual machines, both of which have multiple VF(s) -- PASS
>   RHEL7-11409	[SR-IOV] Start a vm with multiple VF(s) -- PASS
>   RHEL7-11410	[SR-IOV] Start a vm with a PF and multiple VF(s) -- PASS
>   RHEL7-11373	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Shutdown a vm with multiple VF(s)  -- PASS
>   RHEL7-11374	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Reboot a vm with multiple VF(s) -- PASS
>   RHEL7-11375	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Hot plug multiple VF(s) into a vm --  A existed bug
>   RHEL7-11376	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Hot unplug multiple VF(s) from a vm -- PASS
>   RHEL7-11377	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Start a vm with multiple VF(s) -- PASS
>   RHEL7-11407	[SR-IOV] Start a vm with multifunction=on VF(s) -- PASS
>   RHEL7-11395	[SR-IOV] Hot-unplug multiple  VF from a vm -- PASS
>   RHEL7-11397	[SR-IOV] Hot-plug multiple VF(s) into a vm -- A existed bug
>   RHEL7-11428	[vfio-pf] PF sanity test -- PASS
>   RHEL7-11430	[vfio-pf] Shutdown a vm with a PF -- PASS
>   RHEL7-11431	[vfio-pf] Reboot a VF with a PF -- PASS
>   RHEL7-11434	[vfio-pf] Hot unplug a PF from a vm -- PASS
>   RHEL7-11435	[vfio-pf] Hot plug a PF into a vm --  A existed bug
>   RHEL7-11413	[vfio-pf] Start a vm with a PF which has a specified pci address -- PASS
>   RHEL7-11427	[vfio-pf] Release the PF from vm and rebind  PF back to host --PASS
>   RHEL7-11439	[vfio-pf] Start a vm with multiple PFs -- PASS
>   RHEL7-11440	[vfio-pf] Start a vm with  virtual network interface(s) and a PF-- PASS
>   RHEL7-11441	[vfio-pf] Hot unplug multiple PFs from a vm -- PASS
>   RHEL7-11442	[vfio-pf] Start a vm with  multifunction=on PFs -- PASS
>   RHEL7-11444	[vfio-pf] [2M/1G(x86) 16M(ppc) hugepage] basic test -- PASS
>   RHEL7-110508[vfio-pf] Hot unplug and hot plug a PF after a vhost=on virtio nic is unpluged -- A existed bug
> 
> 
> 
> The hot-plug PF/VF issue is tracked by the following two bugs:
> Bug 2055123 - [Q35] Failed to hot-plug a device whose membar > 2M into the vm
> Bug 2024818 - [Windows_vm][Q35+ OVMF] Some hot-plugged PF/VF can not find enough free resources that it can use

Comment 18 Guo, Zhiyi 2022-05-26 10:38:39 UTC
(In reply to Guo, Zhiyi from comment #12)
> (In reply to Yanghang Liu from comment #11)
> > Hi Zhiyi,
> > 
> > 
> > Could you please help update the test result for your part so that we can
> > pre-verify this bug ?
> 
> No vfio related issues found in my regression test.
> 
> Env used:
> Test Env:
>   host:
>       5.14.0-78.mr701_220418_1703.el9.x86_64
>       qemu-kvm-7.0.0-1.el9.x86_64
>   guest:
>       kernel-5.14.0-92.el9.x86_64
>       Win10
> 
> Devices tested for GPU passthrough:
> 1x Nvidia A100
> 2x Nvidia 16
> 
> NVIDIA Driver used: 512.59/GRID 14.1 RC(512.78)
> 
> Devices tested for vGPU:
> 1x GRID A100-40C
> 8x NVIDIA A16-2Q
> 8x GRID RTX6000-3Q
> 4x i915-GVTg_V5_4
> 
> NVIDIA Driver used: GRID 14.1, host driver 510.75, VM driver 512.78;(patched
> GRID 14.0 also tested)

GPU passthrough and vGPU test pass, host kernel used is 5.14.0-96.el9.x86_64

Comment 19 Yanghang Liu 2022-05-26 10:43:07 UTC
Move the bug status to VERIFIED based on Comment 17 and Comment 18.

Comment 21 errata-xmlrpc 2022-11-15 11:02:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8267