Bug 1505873

Summary: Enabling SR-IOV does not work on Dell Poweredge M830
Product: Red Hat Enterprise Virtualization Manager Reporter: Frank DeLorey <fdelorey>
Component: ovirt-hostAssignee: Alona Kaplan <alkaplan>
Status: CLOSED DUPLICATE QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: medium    
Version: 4.1.4CC: fdelorey, gveitmic, mburman
Target Milestone: ovirt-4.1.8   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-27 05:29:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
messages file from host
none
dmesg file from failing host
none
lspci from failing host
none
BIOS settings from host
none
New dmesg after solving reinstall issue
none
hostdevListByCaps none

Description Frank DeLorey 2017-10-24 12:58:46 UTC
Created attachment 1342739 [details]
messages file from host

Description of problem:
Following the RHV 4.1 documentation for enabling SR-IOV on a host does not work on this system. 


Version-Release number of selected component (if applicable):

RHV 4.1.4
RHV-H: rhvh-4.1-0.20170808


How reproducible:
100%


Steps to Reproduce:
1.Verify that SR-IOV is enabled in the BIOS
2.Edit the host and check Hostdev Passthrough & SR-IOV option this adds the kernel parameter intel_iommu=on
3.After host reboots none of the Network interfaces are marked with the SR-IOV icon.

Actual results:
None of the hosts network ports are marked with the SR-IOV icon

Expected results:
The network ports should show SR-IOV as available

Additional info:

I am attaching the lspci output, dmesg and messages files

Comment 1 Frank DeLorey 2017-10-24 12:59:24 UTC
Created attachment 1342741 [details]
dmesg file from failing host

Comment 2 Frank DeLorey 2017-10-24 12:59:57 UTC
Created attachment 1342742 [details]
lspci from failing host

Comment 3 Frank DeLorey 2017-10-24 13:00:36 UTC
Created attachment 1342743 [details]
BIOS settings from host

Comment 4 Frank DeLorey 2017-10-24 15:35:35 UTC
Update from the customer:

I checked the passthrough & SR-IOV option, but I noticed that /etc/grub.cfg  was not updated as I had expected....

The  'rhvh-4.1-0.20170808.0'  entry, which I think is the default, does not include the intel_iommu=on option, while the "tboot 1.9.5" entry does.
Again, the only thing I did was check the box in the UI and reboot.. Should i be using the tboot option instead?

### BEGIN /etc/grub.d/10_linux ###
menuentry 'rhvh-4.1-0.20170808.0' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.el7.x86_64-advanced-/dev/mapper/rhvh_nsc--cld--ulpst--0101-root' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod ext2
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  1998df2d-9c75-4190-97c2-3fae97584e4f
        else
          search --no-floppy --fs-uuid --set=root 1998df2d-9c75-4190-97c2-3fae97584e4f
        fi
        linux16 /rhvh-4.1-0.20170808.0+1/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/rhvh_nsc-cld-ulpst-0101/rhvh-4.1-0.20170808.0+1 ro crashkernel=auto rd.lvm.lv=rhvh_nsc-cld-ulpst-0101/swap rd.lvm.lv=rhvh_nsc-cld-ulpst-0101/rhvh-4.1-0.20170808.0+1 rhgb quiet LANG=en_US.UTF-8 img.bootid=rhvh-4.1-0.20170808.0+1
        initrd16 /rhvh-4.1-0.20170808.0+1/initramfs-3.10.0-693.el7.x86_64.img
}

### END /etc/grub.d/10_linux ###

### BEGIN /etc/grub.d/20_linux_tboot ###
submenu "tboot 1.9.5" {
menuentry 'Red Hat Enterprise Linux GNU/Linux, with tboot 1.9.5 and Linux 3.10.0-693.el7.x86_64' --class red --class gnu-linux --class gnu --class os --class tboot {
        insmod part_msdos
        insmod ext2
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  1998df2d-9c75-4190-97c2-3fae97584e4f
        else
          search --no-floppy --fs-uuid --set=root 1998df2d-9c75-4190-97c2-3fae97584e4f
        fi
        echo    'Loading tboot 1.9.5 ...'
        multiboot       /tboot.gz logging=serial,memory,vga
        echo    'Loading Linux 3.10.0-693.el7.x86_64 ...'
        module /vmlinuz-3.10.0-693.el7.x86_64 root=/dev/mapper/rhvh_nsc--cld--ulpst--0101-root ro crashkernel=auto rd.lvm.lv=rhvh_nsc-cld-ulpst-0101/root rd.lvm.lv=rhvh_nsc-cld-ulpst-0101/swap rhgb quiet intel_iommu=on
        echo    'Loading initial ramdisk ...'
        module /initramfs-3.10.0-693.el7.x86_64.img
}
}
### END /etc/grub.d/20_linux_tboot ###

Comment 5 Frank DeLorey 2017-10-24 20:46:31 UTC
We found that the Installation Guide listed one of the parameters incorrectly, it shows intel_iommu=pt, this should be iommu=pt. This causes the host to not boot. We corrected this and did the reinstall from the Host sub-tab. We rebooted the node and now it boots and dmesg looks good but still none of the network ports on this host show SR-IOV??

[root@nsc-cld-ulpst-0101 ~]# cat /var/log/dmesg | grep -i iommu | more
[    0.000000] Command line: BOOT_IMAGE=/rhvh-4.1-0.20170808.0+1/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/rhvh_nsc-cld-ulpst-0101/rhvh-4.1-0.20170808
.0+1 ro crashkernel=auto rd.lvm.lv=rhvh_nsc-cld-ulpst-0101/swap rd.lvm.lv=rhvh_nsc-cld-ulpst-0101/rhvh-4.1-0.20170808.0+1 rhgb quiet LANG=en_US.UTF-
8 img.bootid=rhvh-4.1-0.20170808.0+1 intel_iommu=on iommu=pt
[    0.000000] Kernel command line: BOOT_IMAGE=/rhvh-4.1-0.20170808.0+1/vmlinuz-3.10.0-693.el7.x86_64 root=/dev/rhvh_nsc-cld-ulpst-0101/rhvh-4.1-0.2
0170808.0+1 ro crashkernel=auto rd.lvm.lv=rhvh_nsc-cld-ulpst-0101/swap rd.lvm.lv=rhvh_nsc-cld-ulpst-0101/rhvh-4.1-0.20170808.0+1 rhgb quiet LANG=en_
US.UTF-8 img.bootid=rhvh-4.1-0.20170808.0+1 intel_iommu=on iommu=pt
[    0.000000] DMAR: IOMMU enabled
[    0.861783] DMAR-IR: IOAPIC id 12 under DRHD base  0xfbffc000 IOMMU 2
[    0.861784] DMAR-IR: IOAPIC id 11 under DRHD base  0xe3ffc000 IOMMU 1
[    0.861785] DMAR-IR: IOAPIC id 10 under DRHD base  0xc7ffc000 IOMMU 0
[    0.861786] DMAR-IR: IOAPIC id 8 under DRHD base  0xabffc000 IOMMU 3
[    0.861787] DMAR-IR: IOAPIC id 9 under DRHD base  0xabffc000 IOMMU 3
[    5.624611] iommu: Adding device 0000:00:00.0 to group 0
[    5.624661] iommu: Adding device 0000:00:01.0 to group 1
[    5.624709] iommu: Adding device 0000:00:02.0 to group 2
[    5.624756] iommu: Adding device 0000:00:03.0 to group 3
[    5.624802] iommu: Adding device 0000:00:03.2 to group 4
[    5.624998] iommu: Adding device 0000:00:05.0 to group 5"

This is becoming critical for this customer as it is holding up their deployments.

Comment 6 Frank DeLorey 2017-10-24 21:07:23 UTC
Created attachment 1342951 [details]
New dmesg after solving reinstall issue

Comment 7 Dan Kenigsberg 2017-10-25 07:05:41 UTC
I suspect that this is a dup of bug 1474638. Would the customer upgrade to RHV-H-4.1.6 in order to verify that?

Also, lowering urgency: this is about a single customer not being able to use and advanced feature. Urgent severity should be kept for widely affecting bugs or crippling of a production deployment.

Comment 9 Frank DeLorey 2017-10-25 10:06:34 UTC
Dan I do not believe that we could be hitting bug 1474638 yet as the PF is not showing SR-IOV so we are not yet at the point of creating the VFs. I will recommend the upgrade though as we will most likely hit this bug eventually once the initial problem is fixed. The sos report from the host is here:

https://api.access.redhat.com/rs/cases/01953577/attachments/b2851de7-0593-4bd4-8766-4be59c8ba1ef

Comment 10 Frank DeLorey 2017-10-25 18:04:00 UTC
Created attachment 1343362 [details]
hostdevListByCaps

Comment 15 Germano Veit Michel 2017-10-27 05:29:01 UTC

*** This bug has been marked as a duplicate of bug 1506887 ***