Bug 592024 - Periodic system preemption on HP z800 when NUMA is enabled
Periodic system preemption on HP z800 when NUMA is enabled
Status: CLOSED NOTABUG
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
1.2
All Linux
medium Severity high
: ---
: ---
Assigned To: Red Hat Real Time Maintenance
David Sommerseth
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-13 13:35 EDT by Jon Thomas
Modified: 2016-05-22 19:30 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-08-08 11:14:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
testcase (4.20 KB, text/plain)
2010-05-13 13:37 EDT, Jon Thomas
no flags Details

  None (edit)
Description Jon Thomas 2010-05-13 13:35:19 EDT
A periodic system preemption of 20 uS is noticed when NUMA is enabled on an HP z800.  If NUMA is disabled in the BIOS, this preemption is no longer noticed.  However, this memory configuration has potential perofmance issues itself.


->> System BIOS does not seem to have to an option to disable SMI
->> When Memory interleaving is disabled, it removes 20 uS preemption

Customer is concerned about the inefficient memory layout after disabling NUMA

He has provided a test application which will describe this problem

Test file: timerAccess.cpp

boot option isolcpus=1-7 is used.

The test used for this is attached.  Basically, the High Res Timer (HRT) is read back to back and the results compared.  This operation should only take 100-200 nS to complete.  Periodically, the system preempts the process for a couple of microseconds.  On this HP workstation, when NUMA is enabled, the preemption time increases to over 20 uS.  The test application does set RT priority and processor affinity.  All CPUs except zero are isolated using the isolcpus=1-7 command on the kernel boot line.  Hardware interrupts and kernel interrupt threads are redirected to CPU 0.

-->> comment from the customer on test application after reading the documentation on

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.0/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-Non_Uniform_Memory_Access.html

Test program uses sched_setaffinity() to pin the thread to a single CPU.  After the thread has been pinned, then you can allocate any memory necessary.  If the system is a NUMA configuration, the OS should then allocate memory as close as possbile (assuming it is available) to the thread.
The preferred way to do this is to start the thread and then change its scheduling class/priority and affinity from inside the program


Observation made by the customer:

->> This has something to do with the 2.6.24 kernel.  This issue disappears on real-time kernels of 2.6.25 and later.  Results are shown using the 2.6.31 kernel.

Results of test application

NUMA On
  2.6.24.7-149.el5rt  
  Min:    0.204 uS
  Avg:    0.210 uS
  Max:   21.731 uS
  Count: 500000000
  Test time: 215 seconds

  2.6.31.12-rt21-nousb  
  Min:    0.111 uS
  Avg:    0.117 uS
  Max:    2.360 uS
  Count: 500000000
  Test time: 122 seconds

NUMA Off
  2.6.24.7-149.el5rt  
  Min:    0.203 uS
  Avg:    0.211 uS
  Max:    5.176 uS
  Count: 500000000
  Test time: 215 seconds

  2.6.31.12-rt21-nousb  
  Min:    0.111 uS
  Avg:    0.117 uS
  Max:    2.739 uS
  Count: 500000000
  Test time: 121 second

->>Informed the customer regarding the kernel RT version 2.6.33 that is going to be released with MRG 1.3. Customer

->We need to identify the change between kernel which is helping to remove the latency issue with the later kernels compared to 2.6.24.
->will we fix this issue with 2.6.24 rt kernel before MRG 1.3 release

Customer has changed kernel compile options and it seems to help in improving the performance.

File attached MRG-Config

->>Some of the parameters disabled are the kernel debugging parameter and CONFIG_CPU_FREQ


How reproducible:

Always for the customer

Steps to Reproduce:

1. Run the attached test application

Actual results:

A periodic system preemption of 20 uS is noted when numa is enabled

Expected results:

Better performance by reducing the preemption
Comment 1 Jon Thomas 2010-05-13 13:37:06 EDT
Created attachment 413832 [details]
testcase
Comment 2 Luis Claudio R. Goncalves 2010-05-13 17:57:35 EDT
I haven't compiled the testcase but noticed it uses CLOCK_REALTIME instead of CLOCK_MONOTONIC. I suggest using CLOCK_MONOTONIC and running the test again.

It would be also interesting knowing what is the clocksource in use on that system (the contents of /sys/devices/system/clocksource/clocksource0/*) and whether the VDSO gettimeofday extensions are enabled or not (/proc/sys/kernel/vsyscall64). These data points can heavily influence the result.

Regards,
Luis
Comment 3 Jon Thomas 2010-05-14 12:22:57 EDT
Luis,

thanks. I've asked that the customer follow up on your suggestions.
Comment 5 Issue Tracker 2010-05-15 05:47:18 EDT
Event posted on 05-15-2010 05:47am EDT by rrajaram




This event sent from IssueTracker by rrajaram 
 issue 861323
it_file 669733

Note You need to log in before you can comment on or make changes to this bug.