592024 – Periodic system preemption on HP z800 when NUMA is enabled

Bug 592024 - Periodic system preemption on HP z800 when NUMA is enabled

Summary: Periodic system preemption on HP z800 when NUMA is enabled

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	realtime-kernel
Sub Component:
Version:	1.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Red Hat Real Time Maintenance
QA Contact:	David Sommerseth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-05-13 17:35 UTC by Jon Thomas
Modified:	2018-10-27 13:38 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-08-08 15:14:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
testcase (4.20 KB, text/plain) 2010-05-13 17:37 UTC, Jon Thomas	no flags	Details
View All

Description Jon Thomas 2010-05-13 17:35:19 UTC

A periodic system preemption of 20 uS is noticed when NUMA is enabled on an HP z800. If NUMA is disabled in the BIOS, this preemption is no longer noticed. However, this memory configuration has potential perofmance issues itself.

->> System BIOS does not seem to have to an option to disable SMI
->> When Memory interleaving is disabled, it removes 20 uS preemption

Customer is concerned about the inefficient memory layout after disabling NUMA

He has provided a test application which will describe this problem

Test file: timerAccess.cpp

boot option isolcpus=1-7 is used.

The test used for this is attached. Basically, the High Res Timer (HRT) is read back to back and the results compared. This operation should only take 100-200 nS to complete. Periodically, the system preempts the process for a couple of microseconds. On this HP workstation, when NUMA is enabled, the preemption time increases to over 20 uS. The test application does set RT priority and processor affinity. All CPUs except zero are isolated using the isolcpus=1-7 command on the kernel boot line. Hardware interrupts and kernel interrupt threads are redirected to CPU 0.

-->> comment from the customer on test application after reading the documentation on

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.0/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-Non_Uniform_Memory_Access.html

Test program uses sched_setaffinity() to pin the thread to a single CPU. After the thread has been pinned, then you can allocate any memory necessary. If the system is a NUMA configuration, the OS should then allocate memory as close as possbile (assuming it is available) to the thread.
The preferred way to do this is to start the thread and then change its scheduling class/priority and affinity from inside the program

Observation made by the customer:

->> This has something to do with the 2.6.24 kernel. This issue disappears on real-time kernels of 2.6.25 and later. Results are shown using the 2.6.31 kernel.

Results of test application

NUMA On
2.6.24.7-149.el5rt
Min: 0.204 uS
Avg: 0.210 uS
Max: 21.731 uS
Count: 500000000
Test time: 215 seconds

2.6.31.12-rt21-nousb
Min: 0.111 uS
Avg: 0.117 uS
Max: 2.360 uS
Count: 500000000
Test time: 122 seconds

NUMA Off
2.6.24.7-149.el5rt
Min: 0.203 uS
Avg: 0.211 uS
Max: 5.176 uS
Count: 500000000
Test time: 215 seconds

2.6.31.12-rt21-nousb
Min: 0.111 uS
Avg: 0.117 uS
Max: 2.739 uS
Count: 500000000
Test time: 121 second

->>Informed the customer regarding the kernel RT version 2.6.33 that is going to be released with MRG 1.3. Customer

->We need to identify the change between kernel which is helping to remove the latency issue with the later kernels compared to 2.6.24.
->will we fix this issue with 2.6.24 rt kernel before MRG 1.3 release

Customer has changed kernel compile options and it seems to help in improving the performance.

File attached MRG-Config

->>Some of the parameters disabled are the kernel debugging parameter and CONFIG_CPU_FREQ

How reproducible:

Always for the customer

Steps to Reproduce:

1. Run the attached test application

Actual results:

A periodic system preemption of 20 uS is noted when numa is enabled

Expected results:

Better performance by reducing the preemption

Comment 1 Jon Thomas 2010-05-13 17:37:06 UTC

Created attachment 413832 [details]
testcase

Comment 2 Luis Claudio R. Goncalves 2010-05-13 21:57:35 UTC

I haven't compiled the testcase but noticed it uses CLOCK_REALTIME instead of CLOCK_MONOTONIC. I suggest using CLOCK_MONOTONIC and running the test again.

It would be also interesting knowing what is the clocksource in use on that system (the contents of /sys/devices/system/clocksource/clocksource0/*) and whether the VDSO gettimeofday extensions are enabled or not (/proc/sys/kernel/vsyscall64). These data points can heavily influence the result.

Regards,
Luis

Comment 3 Jon Thomas 2010-05-14 16:22:57 UTC

Luis,

thanks. I've asked that the customer follow up on your suggestions.

Comment 5 Issue Tracker 2010-05-15 09:47:18 UTC

Event posted on 05-15-2010 05:47am EDT by rrajaram




This event sent from IssueTracker by rrajaram 
 issue 861323
it_file 669733

Note You need to log in before you can comment on or make changes to this bug.