RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1698238 - Do not try and place measurement threads on offline cpus
Summary: Do not try and place measurement threads on offline cpus
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: rteval
Version: 7.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: John Kacur
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1655694 1720687
TreeView+ depends on / blocked
 
Reported: 2019-04-09 21:20 UTC by Clark Williams
Modified: 2019-08-06 12:40 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1720687 (view as bug list)
Environment:
Last Closed: 2019-08-06 12:40:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Check whether a cpu is online (4.19 KB, patch)
2019-05-29 12:32 UTC, John Kacur
no flags Details | Diff
Change hackbench to use systopology (2.79 KB, patch)
2019-05-29 12:33 UTC, John Kacur
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2063 0 None None None 2019-08-06 12:40:25 UTC

Description Clark Williams 2019-04-09 21:20:28 UTC
Description of problem:

Running rteval on a system with hyperthreads disabled using the boot command line argument 'nosmt' results in crash while trying to place a measurement thread on an offline cpu.


Version-Release number of selected component (if applicable):
rteval-2.14-11.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Add 'nosmt' to grub command line (turns off hyperthreading)
2. reboot
3. run rteval, crashes trying to use offline cpu

Additional info:

Backtrace and additional info:
[root@realtime-03 ~]# cd /tmp
[root@realtime-03 tmp]# rteval --duration=1m
got system topology: 2 node system (10 cores per node)
Traceback (most recent call last):
  File "/usr/bin/rteval", line 302, in <module>
    rteval.Prepare(rtevcfg.onlyload)
  File "/usr/lib/python2.7/site-packages/rteval/__init__.py", line 157, in Prepare
    self._measuremods.Setup(params)
  File "/usr/lib/python2.7/site-packages/rteval/modules/measurement/__init__.py", line 182, in Setup
    mp.Setup(modname)
  File "/usr/lib/python2.7/site-packages/rteval/modules/measurement/__init__.py", line 58, in Setup
    modobj = self._InstantiateModule(modname, self._cfg.GetSection(modname))
  File "/usr/lib/python2.7/site-packages/rteval/modules/__init__.py", line 417, in _InstantiateModule
    return self.__modules.InstantiateModule(modname, modcfg, modroot)
  File "/usr/lib/python2.7/site-packages/rteval/modules/__init__.py", line 332, in InstantiateModule
    return mod.create(modcfg, self.__logger)
  File "/usr/lib/python2.7/site-packages/rteval/modules/measurement/cyclictest.py", line 420, in create
    return Cyclictest(params, logger)
  File "/usr/lib/python2.7/site-packages/rteval/modules/measurement/cyclictest.py", line 212, in __init__
    self.__cyclicdata[core].description = info[core]['model name']
KeyError: '20'
[root@realtime-03 tmp]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-19
Off-line CPU(s) list:  20-39
Thread(s) per core:    1
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Stepping:              2
CPU MHz:               2297.477
BogoMIPS:              4594.95
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9
NUMA node1 CPU(s):     10-19
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts spec_ctrl intel_stibp flush_l1d
[root@realtime-03 tmp]# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-957.15.1.rt56.927skipktimersoftd1.el7.x86_64 root=/dev/mapper/rhel_realtime--03-root ro crashkernel=auto rd.lvm.lv=rhel_realtime-03/root rd.lvm.lv=rhel_realtime-03/swap console=ttyS1,115200n81 log_buf_len=1M nosmt isolcpus=10-19 LANG=en_US.UTF-8
[root@realtime-03 tmp]# rpm -q rteval
rteval-2.14-11.el7.noarch

Comment 2 John Kacur 2019-04-16 19:14:41 UTC
Wasn't able to reproduce it on the machine I tried it on.

To verify that the nosmt "took", I looked at

(note the Thread(s) per core is 1, as you would expect)

lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 26
Model name:            Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
Stepping:              5
CPU MHz:               2133.000
CPU max MHz:           2133.0000
CPU min MHz:           1600.0000
BogoMIPS:              4266.74
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3
NUMA node1 CPU(s):     4-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm spec_ctrl intel_stibp flush_l1d

and
cat /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
1

so it does seem like I was able to invoke nosmt

Do you know what version of rt-tests you were using?

Comment 3 John Kacur 2019-05-24 11:42:08 UTC
I believe the problem is that in the case of hotplug, there are entries for cpus that are offline.

I have modified online_cpus() in misc.py to fix this, but it doesn't solve the problem yet because
during the initialization of Cyclictest() another configuration is being passed that is bypassing this function
and merely calling expand_cpulist but passing it data that includes an offline cpu.

getting closer.

Comment 4 John Kacur 2019-05-29 12:32:48 UTC
Created attachment 1574745 [details]
Check whether a cpu is online

Comment 5 John Kacur 2019-05-29 12:33:28 UTC
Created attachment 1574746 [details]
Change hackbench to use systopology

Comment 12 errata-xmlrpc 2019-08-06 12:40:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2063


Note You need to log in before you can comment on or make changes to this bug.