Bug 1901597 - [OSP16.2] Wrong default NovaLibvirtCPUMode set in nova.conf for nova-compute nodes
Summary: [OSP16.2] Wrong default NovaLibvirtCPUMode set in nova.conf for nova-compute ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: Alpha
: 16.2 (Train on RHEL 8.4)
Assignee: Martin Schuppert
QA Contact: James Parker
URL:
Whiteboard:
Depends On: 1901004
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-25 15:34 UTC by Martin Schuppert
Modified: 2021-09-15 07:10 UTC (History)
17 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-2.20201203014855.4304956.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of: 1901004
Environment:
Last Closed: 2021-09-15 07:10:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2021:3483 0 None None None 2021-09-15 07:10:51 UTC

Description Martin Schuppert 2020-11-25 15:34:22 UTC
+++ This bug was initially created as a clone of Bug #1901004 +++

Description of problem:

Failed to create 1G huge pages on rhel guest.
mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G 


Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.3 GA (Train)
RHOS-16.1-RHEL-8-20201110.n.1

How reproducible:

Always, with regression of NFV perf ci,



Steps to Reproduce:
1. Deploy, Director w/ ovs+dpdk, not a must, with the following templates.
 
https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=ospd-16.1-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid;h=247e16dd031230866fa91e9121305b6409b77d0c;hb=refs/heads/ci

2. Deploy vm rhel7.6 image and the follwoing extra-specs
aggregate_instance_extra_specs:flavor='dut_ag', hw:cpu_policy='dedicated', hw:emulator_threads_policy='share', hw:mem_page_size='1GB', hw:numa_cpus.1='0,1,2,3,4,5,6,7', hw:numa_mem.1='8192', hw:numa_nodes='1'

3. ssh to guest, Update kernel with the following:
Add the following to /etc/default/grub
default_hugepagesz=1G hugepagesz=1G hugepages=2

4. grub2-mkconfig -o /boot/grub2/grub.cfg
This is causing to kernel crash
delete from /etc/fstab: nodev /mnt/huge hugetlbfs pagesize=1GB 0 0

5. try the following:
mount -t hugetlbfs nodev /mnt/huge
mkdir /dev/hugepages1G
mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G

Actual results:
Receive the following err:
[  402.080533] hugetlbfs: Unsupported page size 1024 MB



Expected results:
mount should complete success


Additional info:
sos report will be added as first comment with link

--- Additional comment from Miguel Angel Nieto on 2020-11-24 15:01:31 UTC ---

I have checked that with RHOS-16.1-RHEL-8-20201021.n.0 we had in the configuration:

 <cpu mode='custom' match='exact' check='full'>                                
    <model fallback='forbid'>Broadwell-IBRS</model>                             
    <vendor>Intel</vendor>                                                      
    <topology sockets='8' dies='1' cores='1' threads='2'/>                      
    <feature policy='require' name='vme'/>                                      
    <feature policy='require' name='ss'/>                                       
    <feature policy='require' name='vmx'/>                                      
    <feature policy='require' name='f16c'/>                                     
    <feature policy='require' name='rdrand'/>                                   
    <feature policy='require' name='hypervisor'/>                               
    <feature policy='require' name='arat'/>                                     
    <feature policy='require' name='tsc_adjust'/>                               
    <feature policy='require' name='umip'/>                                     
    <feature policy='require' name='stibp'/>                                    
    <feature policy='require' name='arch-capabilities'/>                        
    <feature policy='require' name='xsaveopt'/>                                 
    <feature policy='require' name='pdpe1gb'/>                                  
    <feature policy='require' name='abm'/>                                      
    <feature policy='require' name='ibpb'/>                                     
    <feature policy='require' name='skip-l1dfl-vmentry'/>                       
    <feature policy='require' name='pschange-mc-no'/>                           
    <numa>                                                                      
      <cell id='0' cpus='0-15' memory='8388608' unit='KiB' memAccess='shared'/> 
    </numa>                                                                     
  </cpu>

while in RHOS-16.1-RHEL-8-20201110.n.1 we have the configuration:
  <cpu mode='custom' match='exact' check='full'>                                
    <model fallback='forbid'>qemu64</model>                                     
    <topology sockets='8' dies='1' cores='1' threads='2'/>                      
    <feature policy='require' name='x2apic'/>                                   
    <feature policy='require' name='hypervisor'/>                               
    <feature policy='require' name='lahf_lm'/>                                  
    <feature policy='disable' name='svm'/>                                      
    <numa>                                                                      
      <cell id='0' cpus='0-15' memory='8388608' unit='KiB' memAccess='shared'/> 
    </numa>                                                                     
  </cpu>       

Feature pdpe1gb is missing in the guest, so I think it is not possible to create 1G hugepages

--- Additional comment from Miguel Angel Nieto on 2020-11-24 15:20:22 UTC ---

this is from /proc/cpuinfo in the guest for both versions:

RHOS-16.1-RHEL-8-20201021.n.0:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel Core Processor (Broadwell, IBRS)
stepping	: 2
microcode	: 0x1
cpu MHz		: 2199.998
cache size	: 16384 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat umip spec_ctrl intel_stibp arch_capabilities
bogomips	: 4399.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

RHOS-16.1-RHEL-8-20201110.n.1:
processor       : 0                                                             
vendor_id       : GenuineIntel                                                  
cpu family      : 6                                                             
model           : 13                                                            
model name      : QEMU Virtual CPU version 2.5+                                 
stepping        : 3                                                             
microcode       : 0x1                                                           
cpu MHz         : 2199.998                                                      
cache size      : 16384 KB                                                      
physical id     : 0                                                             
siblings        : 2                                                             
core id         : 0                                                             
cpu cores       : 1                                                             
apicid          : 0                                                             
initial apicid  : 0                                                             
fpu             : yes                                                           
fpu_exception   : yes                                                           
cpuid level     : 13                                                            
wp              : yes                                                           
flags           : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse336 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl xtopology eagerfpu pnni cx16 x2apic hypervisor lahf_lm                                                
bogomips        : 4399.99                                                       
clflush size    : 64                                                            
cache_alignment : 64                                                            
address sizes   : 46 bits physical, 48 bits virtual                             
power management:

--- Additional comment from Martin Schuppert on 2020-11-25 10:22:39 UTC ---

https://github.com/openstack/tripleo-heat-templates/commit/772b7398a7222e8b286848ba00c06006d6b68785 introduced THT parameters to set libvirt/cpu_mode. The patch sets the NovaLibvirtCPUMode wrong to 'none' string which results in puppet-nova not to handle the default cases correct and sets libvirt/cpu_mode to none which results in qemu64 CPU model be used.

A workaround should be to set NovaLibvirtCPUMode to host-model

Comment 3 Karrar Fida 2020-12-22 17:12:30 UTC
1901004 was already verified by DFG NFV

Comment 9 errata-xmlrpc 2021-09-15 07:10:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:3483


Note You need to log in before you can comment on or make changes to this bug.