Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 613667

Summary: always print the number of triggered NMI during test at boot
Product: Red Hat Enterprise Linux 5 Reporter: Adrien Kunysz <akunysz>
Component: kernelAssignee: Don Zickus <dzickus>
Status: CLOSED ERRATA QA Contact: Petr Beňas <pbenas>
Severity: medium Docs Contact:
Priority: high    
Version: 5.6CC: iannis, jruemker, jwest, jwilson, mbrodeur, pbenas, peterm, prarit, pstehlik, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 578905 Environment:
Last Closed: 2011-01-13 21:42:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 578905, 659816    
Bug Blocks:    
Attachments:
Description Flags
proposed patch none

Description Adrien Kunysz 2010-07-12 14:29:43 UTC
+++ This bug was initially created as a clone of Bug #578905 +++

Bug 578905 is about keeping nmi_watchdog enabled even if there are very few NMI triggered during boot-time test. Consequently this change will remove an error message that may sometimes be useful to diagnose hardware/firmware issues.

Could you please consider printing something like "NMI performance counter calibration for CPU#0: 1704->1708" for each CPU regardless of the difference between counts[cpu] and cpu_pda(count)->__nmi_count? This way, even if nmi_watchdog remains enabled we can have some data about comparative CPU speed at boot.

This change would help to diagnose some performance problems due to one or several CPUs being unusually slow.

Comment 1 Adrien Kunysz 2010-07-12 16:08:31 UTC
Created attachment 431209 [details]
proposed patch

Attaching an example patch of how this could be implemented. This is completely untested (I haven't even tried to build with this patch).

Comment 2 Don Zickus 2010-08-16 19:00:44 UTC
Hi Adrien,

I put a patch together that has an output look like the below:

....
Calibrating delay using timer specific routine.. 5586.44 BogoMIPS (lpj=2793222)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 0
CPU3: Thermal monitoring enabled (TM1)
                  Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Brought up 4 CPUs
CPU#0: NMI watchdog performance counter calibration - 272->292
CPU#1: NMI watchdog performance counter calibration - 128->148
CPU#2: NMI watchdog performance counter calibration - 137->158
CPU#3: NMI watchdog performance counter calibration - 61->82
NMI watchdog testing PASSED.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2793.208 MHz processor.
sizeof(vma)=176 bytes
sizeof(page)=56 bytes
....

Let me know if that looks ok.

Cheers,
Don

Comment 3 Adrien Kunysz 2010-08-17 07:31:04 UTC
This looks fine to me. Thank you.

Comment 5 RHEL Program Management 2010-08-27 18:09:38 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Jarod Wilson 2010-09-10 21:40:32 UTC
in kernel-2.6.18-219.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 9 Petr Beňas 2010-10-27 09:17:53 UTC
Verified in 2.6.18.219.el5.x86_64
Requested NMI info is present in dmesg.
# cat /var/log/dmesg | head -n 1170 | tail -n 20
CPU62: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz stepping 06
SMP alternatives: switching to SMP code
Booting processor 63/64 APIC 0x77
Initializing CPU#63
Calibrating delay using timer specific routine.. 4522.08 BogoMIPS (lpj=2261042)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 24576K
CPU 63/77 -> Node 3
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 11
CPU63: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz stepping 06
Brought up 64 CPUs
CPU#0: NMI watchdog performance counter calibration - 4757->4777
CPU#1: NMI watchdog performance counter calibration - 68->88
CPU#2: NMI watchdog performance counter calibration - 67->87
CPU#3: NMI watchdog performance counter calibration - 68->88
CPU#4: NMI watchdog performance counter calibration - 67->87

Comment 12 errata-xmlrpc 2011-01-13 21:42:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html