RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2076304 - VFIO refresh to v5.18
Summary: VFIO refresh to v5.18
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.1
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: 9.1
Assignee: Alex Williamson
QA Contact: Yanghang Liu
URL:
Whiteboard:
Depends On:
Blocks: 2076676 2077294
TreeView+ depends on / blocked
 
Reported: 2022-04-18 16:13 UTC by Alex Williamson
Modified: 2022-11-15 12:31 UTC (History)
7 users (show)

Fixed In Version: kernel-5.14.0-96.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 11:02:09 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src/kernel centos-stream-9 merge_requests 701 0 None None None 2022-04-19 18:55:39 UTC
Red Hat Issue Tracker RHELPLAN-119137 0 None None None 2022-04-18 16:19:42 UTC
Red Hat Product Errata RHSA-2022:8267 0 None None None 2022-11-15 11:02:30 UTC

Description Alex Williamson 2022-04-18 16:13:33 UTC
Description of problem:

Refresh vfio to v5.18

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Yanghang Liu 2022-04-21 14:31:43 UTC
Hi Alex,


May I ask if you have any additional suggestion for testing this bug ?


Is doing regression test enough for verifying this bug ?

Comment 4 Alex Williamson 2022-04-25 15:09:37 UTC
(In reply to Yanghang Liu from comment #3)
> Hi Alex,
> 
> 
> May I ask if you have any additional suggestion for testing this bug ?
> 
> 
> Is doing regression test enough for verifying this bug ?

Yes, regression testing should be used.  Ideally we'd be able to testing vGPU support, particularly SR-IOV backed vGPU, but we'll likely need a new driver drop from NVIDIA for that as there are some differences here that break the NVIDIA GRID driver build.  I can provide a hacked driver for that, but obviously we need to wait for NVIDIA for an official build.  NIC support, PF & VF, NVMe, and direct GPU assignment, etc should all work as previously.  Thanks

Comment 10 Yanghang Liu 2022-05-09 04:00:47 UTC
Pre-verify Test 

Test Env:
  host:
      5.14.0-78.mr701_220418_1703.el9.x86_64
      qemu-kvm-7.0.0-1.el9.x86_64
  guest:
      5.14.0-78.mr701_220418_1703.el9.x86_64
      Win11/Win2022

Test Case:
  RHEL7-11384	[SR-IOV] Start a vm with a  VF -- PASS
  RHEL7-11388	[SR-IOV] Shutdown/Reboot a vm with a VF -- PASS
  RHEL7-11396	[SR-IOV] Hot-unplug a VF from a vm -- PASS
  RHEL7-11399	[SR-IOV] Hot-plug a VF into a vm --  A existed bug
  RHEL7-11408	[SR-IOV] Start two virtual machines, both of which have multiple VF(s) -- PASS
  RHEL7-11409	[SR-IOV] Start a vm with multiple VF(s) -- PASS
  RHEL7-11410	[SR-IOV] Start a vm with a PF and multiple VF(s) -- PASS
  RHEL7-11373	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Shutdown a vm with multiple VF(s)  -- PASS
  RHEL7-11374	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Reboot a vm with multiple VF(s) -- PASS
  RHEL7-11375	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Hot plug multiple VF(s) into a vm --  A existed bug
  RHEL7-11376	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Hot unplug multiple VF(s) from a vm -- PASS
  RHEL7-11377	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Start a vm with multiple VF(s) -- PASS
  RHEL7-11407	[SR-IOV] Start a vm with multifunction=on VF(s) -- PASS
  RHEL7-11395	[SR-IOV] Hot-unplug multiple  VF from a vm -- PASS
  RHEL7-11397	[SR-IOV] Hot-plug multiple VF(s) into a vm -- A existed bug
  RHEL7-11428	[vfio-pf] PF sanity test -- PASS
  RHEL7-11430	[vfio-pf] Shutdown a vm with a PF -- PASS
  RHEL7-11431	[vfio-pf] Reboot a VF with a PF -- PASS
  RHEL7-11434	[vfio] Hot unplug a PF from a vm -- PASS
  RHEL7-11435	[vfio] Hot plug a PF into a vm --  A existed bug
  RHEL7-11413	[vfio] Start a vm with a PF which has a specified pci address -- PASS
  RHEL7-11427	[vfio] Release the PF from vm and rebind  PF back to host -- PASS
  RHEL7-11439	[vfio] Start a vm with multiple PFs -- PASS
  RHEL7-11440	[vfio] Start a vm with  virtual network interface(s) and a PF -- PASS
  RHEL7-11441	[vfio] Hot unplug multiple PFs from a vm -- PASS
  RHEL7-11442	[vfio] Start a vm with  multifunction=on PFs -- PASS
  RHEL7-11444	[vfio] [2M/1G(x86) 16M(ppc) hugepage] basic test -- PASS
  RHEL7-110508	[vfio] Hot unplug and hot plug a PF after a vhost=on virtio nic is unpluged -- A existed bug



The hot-plug PF/VF issue is tracked by the following two bugs:
Bug 2055123 - [Q35] Failed to hot-plug a device whose membar > 2M into the vm
Bug 2024818 - [Windows_vm][Q35+ OVMF] Some hot-plugged PF/VF can not find enough free resources that it can use

Comment 11 Yanghang Liu 2022-05-17 05:43:28 UTC
Hi Zhiyi,


Could you please help update the test result for your part so that we can pre-verify this bug ?

Comment 12 Guo, Zhiyi 2022-05-19 01:17:59 UTC
(In reply to Yanghang Liu from comment #11)
> Hi Zhiyi,
> 
> 
> Could you please help update the test result for your part so that we can
> pre-verify this bug ?

No vfio related issues found in my regression test.

Env used:
Test Env:
  host:
      5.14.0-78.mr701_220418_1703.el9.x86_64
      qemu-kvm-7.0.0-1.el9.x86_64
  guest:
      kernel-5.14.0-92.el9.x86_64
      Win10

Devices tested for GPU passthrough:
1x Nvidia A100
2x Nvidia 16

NVIDIA Driver used: 512.59/GRID 14.1 RC(512.78)

Devices tested for vGPU:
1x GRID A100-40C
8x NVIDIA A16-2Q
8x GRID RTX6000-3Q
4x i915-GVTg_V5_4

NVIDIA Driver used: GRID 14.1, host driver 510.75, VM driver 512.78;(patched GRID 14.0 also tested)

Comment 13 Yanghang Liu 2022-05-19 01:45:28 UTC
Pre-verify this bug based on the Comment 10 and Comment 12.

Comment 17 Yanghang Liu 2022-05-24 04:08:21 UTC
(In reply to Yanghang Liu from comment #10)
The Regression Test Result for verifying this bug: PASS

Test Env:
host:
5.14.0-96.el9.x86_64
qemu-kvm-7.0.0-4.el9.x86_64
guest:
5.14.0-96.el9.x86_64



> Test Case:
>   RHEL7-11384	[SR-IOV] Start a vm with a  VF -- PASS
>   RHEL7-11388	[SR-IOV] Shutdown/Reboot a vm with a VF -- PASS
>   RHEL7-11396	[SR-IOV] Hot-unplug a VF from a vm -- PASS
>   RHEL7-11399	[SR-IOV] Hot-plug a VF into a vm --  A existed bug
>   RHEL7-11408	[SR-IOV] Start two virtual machines, both of which have multiple VF(s) -- PASS
>   RHEL7-11409	[SR-IOV] Start a vm with multiple VF(s) -- PASS
>   RHEL7-11410	[SR-IOV] Start a vm with a PF and multiple VF(s) -- PASS
>   RHEL7-11373	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Shutdown a vm with multiple VF(s)  -- PASS
>   RHEL7-11374	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Reboot a vm with multiple VF(s) -- PASS
>   RHEL7-11375	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Hot plug multiple VF(s) into a vm --  A existed bug
>   RHEL7-11376	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Hot unplug multiple VF(s) from a vm -- PASS
>   RHEL7-11377	[SR-IOV][2M/1G(x86) 16M(ppc) hugepage] Start a vm with multiple VF(s) -- PASS
>   RHEL7-11407	[SR-IOV] Start a vm with multifunction=on VF(s) -- PASS
>   RHEL7-11395	[SR-IOV] Hot-unplug multiple  VF from a vm -- PASS
>   RHEL7-11397	[SR-IOV] Hot-plug multiple VF(s) into a vm -- A existed bug
>   RHEL7-11428	[vfio-pf] PF sanity test -- PASS
>   RHEL7-11430	[vfio-pf] Shutdown a vm with a PF -- PASS
>   RHEL7-11431	[vfio-pf] Reboot a VF with a PF -- PASS
>   RHEL7-11434	[vfio-pf] Hot unplug a PF from a vm -- PASS
>   RHEL7-11435	[vfio-pf] Hot plug a PF into a vm --  A existed bug
>   RHEL7-11413	[vfio-pf] Start a vm with a PF which has a specified pci address -- PASS
>   RHEL7-11427	[vfio-pf] Release the PF from vm and rebind  PF back to host --PASS
>   RHEL7-11439	[vfio-pf] Start a vm with multiple PFs -- PASS
>   RHEL7-11440	[vfio-pf] Start a vm with  virtual network interface(s) and a PF-- PASS
>   RHEL7-11441	[vfio-pf] Hot unplug multiple PFs from a vm -- PASS
>   RHEL7-11442	[vfio-pf] Start a vm with  multifunction=on PFs -- PASS
>   RHEL7-11444	[vfio-pf] [2M/1G(x86) 16M(ppc) hugepage] basic test -- PASS
>   RHEL7-110508[vfio-pf] Hot unplug and hot plug a PF after a vhost=on virtio nic is unpluged -- A existed bug
> 
> 
> 
> The hot-plug PF/VF issue is tracked by the following two bugs:
> Bug 2055123 - [Q35] Failed to hot-plug a device whose membar > 2M into the vm
> Bug 2024818 - [Windows_vm][Q35+ OVMF] Some hot-plugged PF/VF can not find enough free resources that it can use

Comment 18 Guo, Zhiyi 2022-05-26 10:38:39 UTC
(In reply to Guo, Zhiyi from comment #12)
> (In reply to Yanghang Liu from comment #11)
> > Hi Zhiyi,
> > 
> > 
> > Could you please help update the test result for your part so that we can
> > pre-verify this bug ?
> 
> No vfio related issues found in my regression test.
> 
> Env used:
> Test Env:
>   host:
>       5.14.0-78.mr701_220418_1703.el9.x86_64
>       qemu-kvm-7.0.0-1.el9.x86_64
>   guest:
>       kernel-5.14.0-92.el9.x86_64
>       Win10
> 
> Devices tested for GPU passthrough:
> 1x Nvidia A100
> 2x Nvidia 16
> 
> NVIDIA Driver used: 512.59/GRID 14.1 RC(512.78)
> 
> Devices tested for vGPU:
> 1x GRID A100-40C
> 8x NVIDIA A16-2Q
> 8x GRID RTX6000-3Q
> 4x i915-GVTg_V5_4
> 
> NVIDIA Driver used: GRID 14.1, host driver 510.75, VM driver 512.78;(patched
> GRID 14.0 also tested)

GPU passthrough and vGPU test pass, host kernel used is 5.14.0-96.el9.x86_64

Comment 19 Yanghang Liu 2022-05-26 10:43:07 UTC
Move the bug status to VERIFIED based on Comment 17 and Comment 18.

Comment 21 errata-xmlrpc 2022-11-15 11:02:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8267


Note You need to log in before you can comment on or make changes to this bug.