Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 1480405[details]
reproduce code
Description of problem:
ISSUE : numad is the cpu utilization for the process and the wrong calculation for the cpu request for cpu usage
I found workload pattern application using systemtap as below,
1. The finsim application runs that firstly, one process and one thread and is called the preparation process.
-> numad sets one numad node for one process and one thread, sched_setaffinity() called 0x00ff.
2. After preparation process takes about 10 to 15 minutes, the process do not terminated,
and one process and thread 16 ea will be extended.
-> numad has previously calculated the workload and is not distributed over 16 threads due to it was previously calculated for 0x00ff.
-> Therefore, since the current process is marked with 0x00ff, it is a issue that they can find about this issue.
Version-Release number of selected component (if applicable):
numad-0.5-12.20150602git.el6
kernel-2.6.32-696.el6
How reproducible:
Always
Steps to Reproduce:
1. We need to have system with at least two numa.
2. In the attached c code, you should check the number of cpu cores identified in # 1.
- The total number of threads declared in #define THREADS must always be more than one numa node.
For example, if system have total 16 core on 2 numa node, THREADS is 10 or 11.
- The total amount of memory declared in #define MEM_TO_USE_MB must not exceed one numa node.
For example, if system have total memory 32GB on 2 numa node, MEM_TO_USE_MB must not exceed 16GB.
3. Compile the reproduction code file.
- gcc -std=c99 -O3 -Wall -Wextra -pedantic -lpthread sfdc02154588.c -o sfdc02154588
4. Execute numad as follows.
- numad -C 0 -u 100 -H 100 -i 1:100 -l 7 -K 0 -t 100
5. Run the reproduce binaries and check your CPU usage via top util.
Actual results:
numad have bug or limitation for cup request at this time.
Expected results:
So,I found that we need to calculate cpu_request when cpu_request < thread_limit.
@/root/numad-0.5git/numad.c
--- numad.c.orig 2018-08-30 16:33:27.367498743 +0900
+++ numad.c 2018-08-30 16:33:47.004500138 +0900
@@ -2180,9 +2180,7 @@
}
}
thread_limit *= ONE_HUNDRED;
- if (cpu_request > thread_limit) {
- cpu_request = thread_limit;
- }
+ cpu_request = thread_limit;
// If this process was recently bound, enforce a five-minute minimum
// delay between repeated attempts to potentially move the process.
#define MIN_DELAY_FOR_REEVALUATION (300 * ONE_HUNDRED)
Additional info:
Here is system status as below,
[LOG file]
test.stp
....
PID 4981 (numad) called sched_setaffinity() on PID 6059 (sfdc02154588-ne), mask = 0xffffc0000 <--- numad choose only one numa node.
PID 4981 (numad) called sched_setaffinity() on PID 6060 (sfdc02154588-ne), mask = 0xffffc0000
....
@numad.log
<After 100 sec>
....
479933: PID 6059: (sfdc02154588-ne), Threads 21, MBs_size 8654, MBs_used 7176, CPUs_used 1799, Magnitude 12909624, Nodes: 1 <----- Threads 21, but CPUs_used 1799, sched_setaffinity() is 0xffffc0000.
Tue Aug 28 18:42:09 2018: Skipping evaluation of PID 6059 because done too recently.
...
@TOP
6059 root 20 0 8654m 7.0g 452 S 1799.9 0.9 105:38.80 sfdc02154588-ne <----- CPU utilization is only 1800% not 2100%
Red Hat Enterprise Linux 6 is in the Maintenance Support 2 Phase. During the Maintenance Support 2 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.
The official life cycle policy can be reviewed here:
http://redhat.com/rhel/lifecycle
This issue does not meet the inclusion criteria for the Maintenance Support 2 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:
https://access.redhat.com
Created attachment 1480405 [details] reproduce code Description of problem: ISSUE : numad is the cpu utilization for the process and the wrong calculation for the cpu request for cpu usage I found workload pattern application using systemtap as below, 1. The finsim application runs that firstly, one process and one thread and is called the preparation process. -> numad sets one numad node for one process and one thread, sched_setaffinity() called 0x00ff. 2. After preparation process takes about 10 to 15 minutes, the process do not terminated, and one process and thread 16 ea will be extended. -> numad has previously calculated the workload and is not distributed over 16 threads due to it was previously calculated for 0x00ff. -> Therefore, since the current process is marked with 0x00ff, it is a issue that they can find about this issue. Version-Release number of selected component (if applicable): numad-0.5-12.20150602git.el6 kernel-2.6.32-696.el6 How reproducible: Always Steps to Reproduce: 1. We need to have system with at least two numa. 2. In the attached c code, you should check the number of cpu cores identified in # 1. - The total number of threads declared in #define THREADS must always be more than one numa node. For example, if system have total 16 core on 2 numa node, THREADS is 10 or 11. - The total amount of memory declared in #define MEM_TO_USE_MB must not exceed one numa node. For example, if system have total memory 32GB on 2 numa node, MEM_TO_USE_MB must not exceed 16GB. 3. Compile the reproduction code file. - gcc -std=c99 -O3 -Wall -Wextra -pedantic -lpthread sfdc02154588.c -o sfdc02154588 4. Execute numad as follows. - numad -C 0 -u 100 -H 100 -i 1:100 -l 7 -K 0 -t 100 5. Run the reproduce binaries and check your CPU usage via top util. Actual results: numad have bug or limitation for cup request at this time. Expected results: So,I found that we need to calculate cpu_request when cpu_request < thread_limit. @/root/numad-0.5git/numad.c --- numad.c.orig 2018-08-30 16:33:27.367498743 +0900 +++ numad.c 2018-08-30 16:33:47.004500138 +0900 @@ -2180,9 +2180,7 @@ } } thread_limit *= ONE_HUNDRED; - if (cpu_request > thread_limit) { - cpu_request = thread_limit; - } + cpu_request = thread_limit; // If this process was recently bound, enforce a five-minute minimum // delay between repeated attempts to potentially move the process. #define MIN_DELAY_FOR_REEVALUATION (300 * ONE_HUNDRED) Additional info: Here is system status as below, [LOG file] test.stp .... PID 4981 (numad) called sched_setaffinity() on PID 6059 (sfdc02154588-ne), mask = 0xffffc0000 <--- numad choose only one numa node. PID 4981 (numad) called sched_setaffinity() on PID 6060 (sfdc02154588-ne), mask = 0xffffc0000 .... @numad.log <After 100 sec> .... 479933: PID 6059: (sfdc02154588-ne), Threads 21, MBs_size 8654, MBs_used 7176, CPUs_used 1799, Magnitude 12909624, Nodes: 1 <----- Threads 21, but CPUs_used 1799, sched_setaffinity() is 0xffffc0000. Tue Aug 28 18:42:09 2018: Skipping evaluation of PID 6059 because done too recently. ... @TOP 6059 root 20 0 8654m 7.0g 452 S 1799.9 0.9 105:38.80 sfdc02154588-ne <----- CPU utilization is only 1800% not 2100%