Bug 1731120

Summary: [OSP15] Kernel module vfio_iommu_type1 is not loaded on reboot
Product: Red Hat OpenStack Reporter: Vadim Khitrin <vkhitrin>
Component: openstack-tripleo-heat-templatesAssignee: Piotr Kopec <pkopec>
Status: CLOSED ERRATA QA Contact: Vadim Khitrin <vkhitrin>
Severity: high Docs Contact:
Priority: high    
Version: 15.0 (Stein)CC: cfontain, dasmith, eglynn, fbaudin, jhakimra, kchamart, lyarwood, mbooth, mburns, mschuppe, pkopec, sbauza, sgordon, skramaja, supadhya, vromanso, yrachman
Target Milestone: rcKeywords: Patch, Triaged
Target Release: 15.0 (Stein)   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-10.6.1-0.20190819180520.6a38682.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-21 11:24:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vadim Khitrin 2019-07-18 11:15:29 UTC
Description of problem:
On SR-IOV capable deployments when rebooting a compute node, vfio_iommu_type1 will not be loaded which will cause guest instances with VF/PF fail to start/spawn.

Manually loading the kernel module will allow to spawn guest instances successfully.

Version-Release number of selected component (if applicable):
Compose: RHOS_TRUNK-15.0-RHEL-8-20190714.n.0
rpm -qa | grep -E 'vswitch|nova'
python3-openvswitch2.11-2.11.0-14.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-12.el8fdp.noarch
python3-novaclient-13.0.1-0.20190617080642.ef842ca.el8ost.noarch
python3-nova-19.0.2-0.20190701170413.b01bc2f.el8ost.noarch
network-scripts-openvswitch2.11-2.11.0-14.el8fdp.x86_64
puppet-vswitch-10.4.1-0.20190614170518.ee3e6e1.el8ost.noarch
openstack-nova-compute-19.0.2-0.20190701170413.b01bc2f.el8ost.noarch
rhosp-openvswitch-2.11-0.3.el8ost.noarch
puppet-nova-14.4.1-0.20190605170411.17663a5.el8ost.noarch
python3-rhosp-openvswitch-2.11-0.3.el8ost.noarch
openstack-nova-common-19.0.2-0.20190701170413.b01bc2f.el8ost.noarch
openstack-nova-migration-19.0.2-0.20190701170413.b01bc2f.el8ost.noarch
openvswitch2.11-2.11.0-14.el8fdp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Deploy SR-IOV capable setup
2. Reboot compute node
3. Attempt to spawn guest instance with VF/PF

Actual results:
vfio_iommu_type1 is no loaded causing guest instances to not start

Expected results:
vfio_iommu_type1 is loaded and guest instances are started succesfuly

Additional info:

Comment 2 Vadim Khitrin 2019-07-24 11:37:08 UTC
I forgot to mention that my compute role is the default ’ComputeOvsDpdkSriov’.

Comment 8 Yariv 2019-08-08 11:52:49 UTC
Failing with puddle RHOS_TRUNK-15.0-RHEL-8-20190725.n.1
Deployment with two computes, the parameter is up, aftyer reboot it is not loaded.

[root@computeovsdpdksriov-1 ~]# lsmod| grep vfio_iommu_type1
vfio                   36864  7 vfio_iommu_type1,vfio_pci

[root@computeovsdpdksriov-0 ~]# lsmod| grep vfio_iommu_type1

Comment 12 Yariv 2019-08-15 16:23:45 UTC
[stack@undercloud-0 tempest]$ 
RHOS_TRUNK-15.0-RHEL-8-20190813.n.0
openstack-tripleo-heat-templates-10.6.1-0.20190812140519.2a684c0.el8ost.noarch

After fresh install
[root@compute-0 ~]# lsmod| grep vfio_iommu_type1
vfio_iommu_type1       22440  1 
vfio                   32657  7 vfio_iommu_type1,vfio_pci


After reboot
lsmod| grep vfio_iommu_type1

EMPTY

Comment 14 Yariv 2019-08-21 09:45:45 UTC
[stack@undercloud-0 tempest]$ 
RHOS_TRUNK-15.0-RHEL-8-20190819.n.1

openstack-tripleo-heat-templates-10.6.1-0.20190812140519.2a684c0.el8ost.noarch

After fresh install
[root@compute-0 ~]# lsmod| grep vfio_iommu_type1
lsmod| grep vfio_iommu_type1
vfio_iommu_type1       28672  1
vfio                   36864  7 vfio_iommu_type1,vfio_pci


After reboot
lsmod| grep vfio_iommu_type1
vfio_iommu_type1       28672  1
vfio                   36864  7 vfio_iommu_type1,vfio_pci

Comment 18 errata-xmlrpc 2019-09-21 11:24:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811