Bug 592024

Summary: Periodic system preemption on HP z800 when NUMA is enabled
Product: Red Hat Enterprise MRG Reporter: Jon Thomas <jthomas>
Component: realtime-kernelAssignee: Red Hat Real Time Maintenance <rt-maint>
Status: CLOSED NOTABUG QA Contact: David Sommerseth <davids>
Severity: high Docs Contact:
Priority: medium    
Version: 1.2CC: bhu, jwest, lgoncalv, ovasik, tao, williams
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-08 15:14:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
testcase none

Description Jon Thomas 2010-05-13 17:35:19 UTC
A periodic system preemption of 20 uS is noticed when NUMA is enabled on an HP z800.  If NUMA is disabled in the BIOS, this preemption is no longer noticed.  However, this memory configuration has potential perofmance issues itself.


->> System BIOS does not seem to have to an option to disable SMI
->> When Memory interleaving is disabled, it removes 20 uS preemption

Customer is concerned about the inefficient memory layout after disabling NUMA

He has provided a test application which will describe this problem

Test file: timerAccess.cpp

boot option isolcpus=1-7 is used.

The test used for this is attached.  Basically, the High Res Timer (HRT) is read back to back and the results compared.  This operation should only take 100-200 nS to complete.  Periodically, the system preempts the process for a couple of microseconds.  On this HP workstation, when NUMA is enabled, the preemption time increases to over 20 uS.  The test application does set RT priority and processor affinity.  All CPUs except zero are isolated using the isolcpus=1-7 command on the kernel boot line.  Hardware interrupts and kernel interrupt threads are redirected to CPU 0.

-->> comment from the customer on test application after reading the documentation on

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.0/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-Non_Uniform_Memory_Access.html

Test program uses sched_setaffinity() to pin the thread to a single CPU.  After the thread has been pinned, then you can allocate any memory necessary.  If the system is a NUMA configuration, the OS should then allocate memory as close as possbile (assuming it is available) to the thread.
The preferred way to do this is to start the thread and then change its scheduling class/priority and affinity from inside the program


Observation made by the customer:

->> This has something to do with the 2.6.24 kernel.  This issue disappears on real-time kernels of 2.6.25 and later.  Results are shown using the 2.6.31 kernel.

Results of test application

NUMA On
  2.6.24.7-149.el5rt  
  Min:    0.204 uS
  Avg:    0.210 uS
  Max:   21.731 uS
  Count: 500000000
  Test time: 215 seconds

  2.6.31.12-rt21-nousb  
  Min:    0.111 uS
  Avg:    0.117 uS
  Max:    2.360 uS
  Count: 500000000
  Test time: 122 seconds

NUMA Off
  2.6.24.7-149.el5rt  
  Min:    0.203 uS
  Avg:    0.211 uS
  Max:    5.176 uS
  Count: 500000000
  Test time: 215 seconds

  2.6.31.12-rt21-nousb  
  Min:    0.111 uS
  Avg:    0.117 uS
  Max:    2.739 uS
  Count: 500000000
  Test time: 121 second

->>Informed the customer regarding the kernel RT version 2.6.33 that is going to be released with MRG 1.3. Customer

->We need to identify the change between kernel which is helping to remove the latency issue with the later kernels compared to 2.6.24.
->will we fix this issue with 2.6.24 rt kernel before MRG 1.3 release

Customer has changed kernel compile options and it seems to help in improving the performance.

File attached MRG-Config

->>Some of the parameters disabled are the kernel debugging parameter and CONFIG_CPU_FREQ


How reproducible:

Always for the customer

Steps to Reproduce:

1. Run the attached test application

Actual results:

A periodic system preemption of 20 uS is noted when numa is enabled

Expected results:

Better performance by reducing the preemption

Comment 1 Jon Thomas 2010-05-13 17:37:06 UTC
Created attachment 413832 [details]
testcase

Comment 2 Luis Claudio R. Goncalves 2010-05-13 21:57:35 UTC
I haven't compiled the testcase but noticed it uses CLOCK_REALTIME instead of CLOCK_MONOTONIC. I suggest using CLOCK_MONOTONIC and running the test again.

It would be also interesting knowing what is the clocksource in use on that system (the contents of /sys/devices/system/clocksource/clocksource0/*) and whether the VDSO gettimeofday extensions are enabled or not (/proc/sys/kernel/vsyscall64). These data points can heavily influence the result.

Regards,
Luis

Comment 3 Jon Thomas 2010-05-14 16:22:57 UTC
Luis,

thanks. I've asked that the customer follow up on your suggestions.

Comment 5 Issue Tracker 2010-05-15 09:47:18 UTC
Event posted on 05-15-2010 05:47am EDT by rrajaram




This event sent from IssueTracker by rrajaram 
 issue 861323
it_file 669733