Bug 1242406
| Summary: | performance degradation in "ulimit -u 7810" environments (default on rhel6.3+ and rhel7 deployments) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Christian Horn <chorn> |
| Component: | bash | Assignee: | Siteshwar Vashisht <svashisht> |
| Status: | CLOSED WONTFIX | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.8 | CC: | ashankar, chorn, cww, fweimer, igreen, jmario, mnewsome, mrdest, ovasik, pfrankli, qguo, rfreire, thgardne, tmraz |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-15 18:16:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1269194 | ||
| Attachments: | |||
|
Description
Christian Horn
2015-07-13 09:38:34 UTC
Comparing the time spent in syscalls, from $ strace -fc ./new3.sh >&strace_fc_out.txt" RHEL5: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 50.61 0.211656 6 37845 9916 wait4 9.97 0.041700 0 127122 7071 read 6.83 0.028555 0 320031 mmap 6.29 0.026293 1 20001 clone [..] ------ ----------- ----------- --------- --------- ---------------- 100.00 0.418186 2413840 104281 total RHEL6: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 58.25 0.477852 12 39999 9999 wait4 14.92 0.122369 1 130037 9998 read 4.48 0.036780 0 310035 mmap 3.34 0.027412 0 150041 12 open [..] ------ ----------- ----------- --------- --------- ---------------- 100.00 0.820333 2614030 100036 total The runtimes in the description are just one time runs. Also looking over multiple runs, the differences stay. This was taken on a different system, hp-dl580g5-01.rhts.eng.bos.redhat.com (rhel5.8 vs. rhel6.5): # looking at "cputime" as shown by "top": rhel5: min:06.05 max:06.51 samples:50 average: 6.25 rhel6: min:12.26 max:12.52 samples:23 average: 12.39 # the average runtime of the script rhel5: min:29.73 max:30.16 samples:50 average: 29.94 rhel6: min:48.08 max:48.81 samples:23 average: 48.31 # average "sys" time from output of "/usr/bin/time ./new3.sh" rhel5: min:34.40 max:35.12 samples:50 average: 34.82 rhel6: min:38.65 max:39.31 samples:23 average: 38.92 # average "usr" time from output of "/usr/bin/time ./new3.sh" rhel5: min:11.80 max:12.24 samples:50 average: 12.04 rhel6: min:24.79 max:25.61 samples:23 average: 25.11 The numbers in #3 are from a physical system, "description" and #2 are from KVM. (In reply to Christian Horn from comment #3) > The runtimes in the description are just one time runs. Also looking over > multiple runs, the differences stay. This was taken on a different system, > hp-dl580g5-01.rhts.eng.bos.redhat.com (rhel5.8 vs. rhel6.5): > > # looking at "cputime" as shown by "top": > rhel5: min:06.05 max:06.51 samples:50 average: 6.25 > rhel6: min:12.26 max:12.52 samples:23 average: 12.39 > > # the average runtime of the script > rhel5: min:29.73 max:30.16 samples:50 average: 29.94 > rhel6: min:48.08 max:48.81 samples:23 average: 48.31 > > # average "sys" time from output of "/usr/bin/time ./new3.sh" > rhel5: min:34.40 max:35.12 samples:50 average: 34.82 > rhel6: min:38.65 max:39.31 samples:23 average: 38.92 > > # average "usr" time from output of "/usr/bin/time ./new3.sh" > rhel5: min:11.80 max:12.24 samples:50 average: 12.04 > rhel6: min:24.79 max:25.61 samples:23 average: 25.11 chorn, since this is a "new" issue and we're nearing the end of the RHEL6 lifecycle, can you check to see if older versions of RHEL6 also have this problem? P. Prarit, seems like this issue has appeared inside of the rhel6 line. ### results for KVM guests (host fedora22) cputime = "cpu time the process did use up, according to 'top'" time = apsolute time the script needed to complete cputime min:02.20 max:02.59 samples:50 average: 2.39 # rhel5.6GA time min:13.77 max:14.91 samples:50 average: 14.02 cputime min:02.30 max:02.74 samples:50 average: 2.48 # rhel5.9GA time min:13.93 max:14.48 samples:50 average: 14.16 cputime min:02.29 max:02.90 samples:50 average: 2.52 # rhel6.0GA time min:10.01 max:11.03 samples:50 average: 10.31 cputime min:02.24 max:03.00 samples:50 average: 2.58 # rhel6.2GA time min:09.90 max:11.50 samples:50 average: 10.40 cputime min:02.52 max:03.25 samples:50 average: 2.66 # rhel6.2GA w/ 6.4GA kernel time min:09.91 max:12.82 samples:50 average: 10.44 cputime min:02.58 max:03.63 samples:50 average: 2.82 # rhel6.2GA w/ 6.6GA kernel time min:09.98 max:13.54 samples:50 average: 10.90 cputime min:04.91 max:06.24 samples:37 average: 5.45 # rhel6.2GA, upgraded to 6.6GA kernel + 6.6 glibc + reboot time min:20.88 max:24.04 samples:37 average: 22.06 ### results for a physical system, hp-dl165g7-01.rhts.eng.bos.redhat.com cputime min:03.51 max:03.93 samples:24 average: 3.76 # rhel5.6GA time min:21.91 max:22.32 samples:24 average: 22.13 cputime min:05.97 max:06.53 samples:15 average: 6.18 # rhel6.2GA time min:20.78 max:21.08 samples:15 average: 20.95 cputime min:13.86 max:14.12 samples:10 average: 13.96 # rhel6.6GA time min:43.67 max:43.95 samples:10 average: 43.83 >
> cputime min:02.24 max:03.00 samples:50 average: 2.58 # rhel6.2GA
> time min:09.90 max:11.50 samples:50 average: 10.40
>
> cputime min:02.52 max:03.25 samples:50 average: 2.66 # rhel6.2GA w/ 6.4GA
> kernel
> time min:09.91 max:12.82 samples:50 average: 10.44
>
> cputime min:02.58 max:03.63 samples:50 average: 2.82 # rhel6.2GA w/ 6.6GA
> kernel
> time min:09.98 max:13.54 samples:50 average: 10.90
>
> cputime min:04.91 max:06.24 samples:37 average: 5.45 # rhel6.2GA, upgraded
> to 6.6GA kernel + 6.6 glibc + reboot
> time min:20.88 max:24.04 samples:37 average: 22.06
>
>
So it has crept up a little bit due to the kernel, but the big problem here appears to be glibc. I'm reassigning this to the glibc component.
P.
Bisecting the RHEL6 minor versions and then packages leading to the issue, it turns out that RHEL6.6 with the RHEL6.2 pam package executes the script as fast as pure RHEL6.2 does. Upgrading/downgrading the pam package, logging out/in via SSH, and then executing the testcode switches between the 2 behaviours. Although this is down to "something with pam", I am still unsure how it would influence the syscall time. Comparing "strace -fc .." of running the testcode with old vs. new pam package installed: new/probematic pam package installed: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 66.07 0.491853 12 39999 9999 wait4 12.13 0.090276 1 130037 9997 read 4.72 0.035118 0 310035 mmap 2.51 0.018678 0 230042 4 close old/fast pam package installed: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 46.61 0.205347 5 39999 9998 wait4 13.82 0.060866 0 130037 9997 read 9.06 0.039923 0 310035 mmap 8.70 0.038331 0 320031 rt_sigaction 4.39 0.019324 0 230042 4 close [...] Thanks prarit. Pure glibc and/or kernel changes have now indeed turned out to not influence the issue. Downstripped reproducer:
1. Setup a rhel6.4 system or newer, can be KVM guest (
2. $ cat >new3.sh<<EOT
#!/bin/sh
i=10000;
until test $i -eq 1; do
MYVAR=$(date)
/bin/sh &
i=$((i-1));
done
MYPID=$$
top -p $MYPID -n1 |tail -2|head -1
EOT
2. chmod +x new3.sh
3. /usr/bin/time ./new3.sh
Actual results:
11847 root 20 0 104m 2052 1052 S 0.0 0.4 0:04.23 new3.sh
7.41user 5.61system 0:16.39elapsed 79%CPU (0avgtext+0avgdata 8208maxresident)k
0inputs+0outputs (0major+8817338minor)pagefaults 0swaps
=> so "4.23" showed in tops cpu-column
=> overall runtime 16sec
This is with pam-1.1.1-13.el6.x86_64 or pam-1.1.1-20.el6.x86_64
Expected results:
31887 root 20 0 103m 1476 1052 S 0.0 0.3 0:02.68 new3.sh
1.24user 1.59system 0:09.98elapsed 28%CPU (0avgtext+0avgdata 5904maxresident)k
0inputs+0outputs (0major+7938250minor)pagefaults 0swaps
=> so "2.68" showed in tops cpu-column
=> overall runtime 9.98sec
This is with pam downgraded to pam-1.1.1-10.el6.x86_64 (and logout/login)
Additional info:
- The originally reported issue was for the OSWatcher script, used for monitoring purposes. The code above is a version, simplified to the parts which seem performance relevant.
- Besides looking at overall performance, it was noticed that overall "cpu accounting" as shown in top appears higher for rhel6. We asumed this accounting to be a variant of bz812148, but none of the countermeasures reported to help in other cases did help here.
The issue is root soft nproc unlimited which we have in pam-1.1.1-12.el6 and later. From changelog: * 金 10月 12 2012 Tomas Mraz <tmraz> 1.1.1-12 [..] - relax soft nproc limit for root in 90-nproc.conf So deploying a 6.6.z system, the testcode runs 19.00sec in average (variations between 17.40 and 20.37seconds) With the nproc line removed (as its in pam-1.1.1-10.el6), it runs 12.21sec in average (variations between 11.66 and 13.74seconds) This also affects rhel7 btw. I have no idea how to investigate this deeper. (In reply to Christian Horn from comment #15) > The issue is > root soft nproc unlimited > which we have in pam-1.1.1-12.el6 and later. From changelog: > > * 金 10月 12 2012 Tomas Mraz <tmraz> 1.1.1-12 > [..] > - relax soft nproc limit for root in 90-nproc.conf > > So deploying a 6.6.z system, the testcode > runs 19.00sec in average (variations between 17.40 and 20.37seconds) > With the nproc line removed (as its in pam-1.1.1-10.el6), it > runs 12.21sec in average (variations between 11.66 and 13.74seconds) > > This also affects rhel7 btw. > I have no idea how to investigate this deeper. I suggest starting by asking the pam team to explain why they changed it and to review the performance issues. We changed the limit because the previous limit was too strong for the normal use-cases. What happens if the file is removed altogether - i.e. there is no nproc setting in pam configuration at all? It should be probably reassigned to kernel as the source of the degradation is not pam (it did only a configuration change of proces limits) and I have no idea how to investigate what happens in kernel for various nproc settings. Thanks for the comment. The slow/fast behaviour can be switched in using the "root soft nproc unlimited" line, vs. commenting it out or deleting it. But I would like to know what happens if you remove both nproc lines in the file. Is the performance same as if only the root line is removed? Complete file removed results in bad performance: # Just "* soft nproc 1024" in the file: cputime min:02.39 max:02.69 samples:8 average: 2.51 time min:09.08 max:09.99 samples:8 average: 9.52 # having file /etc/security/limits.d/90-nproc.conf removed cputime min:03.68 max:04.06 samples:11 average: 3.80 time min:14.65 max:16.05 samples:11 average: 15.01 # having both lines, as shipped with pam-1.1.1-12.el6 and later cputime min:03.68 max:04.13 samples:12 average: 3.91 time min:14.60 max:16.05 samples:12 average: 15.30 Unfortunately that means we can hardly do anything with the regression in PAM. Limiting again the root to nproc = 1024 is not an option. I verified to see the fast behaviour in a "ulimit -u 1024" environment and the slow behaviour in a "ulimit -u 7810" environment, will run perf with both. Not sure what to do further with the issue, reassigning to kernel for consideration. Created attachment 1054410 [details]
fast situation: perf record -o perf_ulimit_u_1024 ./new3.sh
# changing ulimit
ulimit -u 1024
# verifying we have the fast behaviour
/usr/bin/time ./new3.sh
# running perf
perf record -o perf_ulimit_u_1024 ./new3.sh
Created attachment 1054412 [details]
slow situation: perf record -o perf_ulimit_u_7810 ./new3.sh
# changing ulimit
ulimit -u 7810
# verifying we have the slow behaviour
/usr/bin/time ./new3.sh
# running perf
perf record -o perf_ulimit_u_7810 ./new3.sh
Attached perf outputs from "ulimit -u 1024" and "-u 7810" environments. Does this or something else help pin the issue down to kernel, bash, glibc or something else? Hello kernel specialists, so we have the issue pinned down to the following.
[root@rhel6u5a test_shell_kvm]# cat new3.sh
#!/bin/sh
i=10000;
until test $i -eq 1; do
MYVAR=$(date)
/bin/sh &
i=$((i-1));
done
MYPID=$$
top -p $MYPID -n1 |tail -2|head -1
[root@rhel6u5a test_shell_kvm]# ulimit -u 7810
[root@rhel6u5a test_shell_kvm]# ulimit -u
7810
[root@rhel6u5a test_shell_kvm]# /usr/bin/time ./new3.sh
29017 root 20 0 105m 2756 1068 S 0.0 0.3 0:04.85 new3.sh
9.35user 5.77system 0:19.51elapsed 77%CPU (0avgtext+0avgdata 11024maxresident)k
[root@rhel6u5a test_shell_kvm]# ulimit -u 1024
[root@rhel6u5a test_shell_kvm]# ulimit -u
1024
[root@rhel6u5a test_shell_kvm]# /usr/bin/time ./new3.sh
11801 root 20 0 103m 1480 1068 S 0.0 0.1 0:02.56 new3.sh
1.12user 1.59system 0:09.49elapsed 28%CPU (0avgtext+0avgdata 5920maxresident)k
0inputs+0outputs (0major+8064654minor)pagefaults 0swaps
- performance of the code is much worse with increased "max processes"
- the "-u 1024" was our default on rhel6.2 and earlier, the "7810" is default on i.e. rhel6.5
- #24 has a perf output of the "ulimit -u 1024" case
- #25 has a perf output of the "ulimit -u 7810" case
Could you help finding out what is causing this?
perf outputs for the script from #27 , to compare both environments:
$ ulimit -u 1024
$ perf record -a -o perf_a_1024 ./new3.sh
$ perf report -k /boot/vmlinuz-2.6.32-431.59.1.el6.x86_64.debug \
--show-total-period -i perf_a_1024
[..]
22.66% 19626 date [kernel.kallsyms] [k] 0xffffffff810a0440
20.77% 17992 sh [kernel.kallsyms] [k] 0xffffffff8104fc30
7.43% 6434 new3.sh [kernel.kallsyms] [k] 0xffffffff8129520f
6.70% 5806 init [kernel.kallsyms] [k] 0xffffffff8103eacb
6.50% 5635 swapper [kernel.kallsyms] [k] 0xffffffff8103eacb
3.33% 2882 new3.sh bash [.] 0x0000000000067959
2.43% 2104 sh libc-2.12.so [.] __mbrtowc
2.29% 1987 date bash [.] 0x000000000003e713
[..]
$ ulimit -u 7810
$ perf record -a -o perf_a_7810 ./new3.sh
$ perf report -k /boot/vmlinuz-2.6.32-431.59.1.el6.x86_64.debug \
--show-total-period -i perf_a_7810
[..]
19.15% 32403 date [kernel.kallsyms] [k] 0xffffffff8114da12
14.51% 24551 init [kernel.kallsyms] [k] 0xffffffff8103eacb
13.86% 23446 swapper [kernel.kallsyms] [k] 0xffffffff8103eacb
12.42% 21015 sh [kernel.kallsyms] [k] 0xffffffff8112d20a
6.14% 10396 new3.sh bash [.] 0x000000000003e660
4.64% 7848 date bash [.] 0x000000000003e713
4.26% 7201 new3.sh [kernel.kallsyms] [k] 0xffffffff81153a36
3.90% 6596 date libc-2.12.so [.] _int_free
2.00% 3378 date bash [.] delete_job
1.65% 2800 date libc-2.12.so [.] free
1.26% 2134 sh libc-2.12.so [.] __mbrtowc
Any ideas for the reason of this?
Performance is not abruptly getting worse but gradually. Setting "ulimit -u" to increasing values and measuring the code runtime: $ for i in 16 32 64 128 256 512 768 1024 2048 4096 8192 16284; do ulimit -u $i ; echo -ne "$i : ";/usr/bin/time ./new4.sh 2>&1|grep elaps|sed -e 's,elaps.*,,' -e 's,.* ,,'; done 16 : 0:08.11 32 : 0:06.80 64 : 0:07.52 128 : 0:07.11 256 : 0:08.52 512 : 0:08.83 768 : 0:08.89 1024 : 0:09.64 <-- this is the default we shipped rhel6.0 with 2048 : 0:12.25 4096 : 0:16.47 8192 : 0:20.32 <-- this is roughly what we have in rhel6.3+ 16284 : 0:21.04 On rhel5, I have identical execution times for the code. For "ulimit -u" ranging from 16 to 16284, I get execution times between 12.36 and 12.62 seconds. RHEL7 has the same issue as rhel6, output quite similiar to #29.
rhel6 data
maxproc 1024
- 16.16% 0.54% new3.sh bash [.] make_child ◆
make_child ▒
- 12.28% 0.23% new3.sh libc-2.12.so [.] __libc_fork ▒
- __libc_fork ▒
100.00% make_child ▒
+ 3.96% 3.89% sh libc-2.12.so [.] __gconv_transform_utf8_internal ▒
+ 3.65% 3.65% sh [kernel.kallsyms] [k] __do_page_fault ▒
+ 3.42% 0.23% sh ld-2.12.so [.] _dl_sysdep_start ▒
+ 3.11% 3.11% date [kernel.kallsyms] [k] __do_page_fault ▒
+ 2.95% 0.31% sh ld-2.12.so [.] dl_main ▒
+ 2.80% 2.80% new3.sh [kernel.kallsyms] [k] __do_page_fault ▒
+ 2.72% 2.72% new3.sh [kernel.kallsyms] [k] _spin_unlock_irqrestore ▒
+ 2.49% 0.16% date ld-2.12.so [.] _dl_sysdep_start ▒
+ 2.33% 2.33% sh libc-2.12.so [.] __mbrtowc ▒
+ 2.18% 0.08% new3.sh libc-2.12.so [.] __xstat64 ▒
+ 2.10% 0.00% new3.sh libc-2.12.so [.] __execve ▒
+ 2.10% 0.70% sh ld-2.12.so [.] _dl_relocate_object ▒
+ 2.02% 2.02% new3.sh [kernel.kallsyms] [k] kmem_cache_alloc ▒
+ 1.63% 1.40% sh libc-2.12.so [.] _int_malloc ▒
+ 1.48% 0.08% date ld-2.12.so [.] dl_main ▒
+ 1.48% 1.48% new3.sh [kernel.kallsyms] [k] get_page_from_freelist ▒
+ 1.40% 0.47% new3.sh bash [.] hash_search ▒
+ 1.40% 0.00% new3.sh libc-2.12.so [.] __sigprocmask ▒
+ 1.40% 1.40% new3.sh [kernel.kallsyms] [k] flush_tlb_others_ipi ▒
+ 1.40% 0.23% new3.sh libc-2.12.so [.] ptmalloc_unlock_all ▒
+ 1.32% 1.24% sh ld-2.12.so [.] do_lookup_x ▒
+ 1.32% 0.78% new3.sh libc-2.12.so [.] _int_malloc
maxproc 7180
- 7.73% 0.06% new3.sh bash [.] make_child ◆
make_child ▒
- 5.95% 0.00% new3.sh libc-2.12.so [.] __libc_fork ▒
- __libc_fork ▒
100.00% make_child ▒
+ 5.46% 5.46% sh [kernel.kallsyms] [k] __do_page_fault ▒
+ 4.97% 4.90% sh libc-2.12.so [.] __gconv_transform_utf8_internal ▒
+ 4.90% 0.31% sh ld-2.12.so [.] _dl_sysdep_start ▒
+ 4.72% 4.41% sh libc-2.12.so [.] __mbrtowc ▒
+ 3.74% 0.06% sh ld-2.12.so [.] dl_main ▒
+ 3.13% 3.13% date [kernel.kallsyms] [k] __do_page_fault ▒
+ 3.00% 0.12% date ld-2.12.so [.] _dl_sysdep_start ▒
+ 2.82% 2.82% new3.sh [kernel.kallsyms] [k] __do_page_fault ▒
+ 2.27% 2.15% sh ld-2.12.so [.] do_lookup_x ▒
+ 2.27% 0.86% sh ld-2.12.so [.] _dl_relocate_object ▒
+ 2.02% 0.00% new3.sh libc-2.12.so [.] __xstat64 ▒
+ 1.84% 0.12% date ld-2.12.so [.] dl_main ▒
+ 1.66% 0.00% sh ld-2.12.so [.] _dl_map_object ▒
+ 1.53% 0.12% date ld-2.12.so [.] _dl_map_object ▒
+ 1.41% 0.00% sh libc-2.12.so [.] __gconv_find_transform ▒
+ 1.41% 0.06% date libc-2.12.so [.] setlocale ▒
+ 1.29% 1.23% date libc-2.12.so [.] _dl_addr ▒
+ 1.29% 0.00% date ld-2.12.so [.] mmap64 ▒
+ 1.29% 0.00% sh ld-2.12.so [.] mmap64 ▒
+ 1.29% 0.43% date ld-2.12.so [.] _dl_relocate_object ▒
+ 1.10% 0.80% sh ld-2.12.so [.] check_match.12445 ▒
+ 1.10% 0.00% date libc-2.12.so [.] _nl_find_locale
tcsh on both rhel6 and rhel7 does not have same problem so this may not be related to kernel but rather bash (just a theory)
[root@mv-rhel6 ~]# cat new4.sh
#!/bin/tcsh
foreach i (`seq 1 10000`)
set myvar = `date`
/bin/tcsh &
end
[root@mv-rhel6 ~]# cat run.sh
#!/bin/tcsh
set limit = 0
foreach u (`seq 1 7`)
@ limit = ($u * 1024)
limit maxproc $limit
limit maxproc
time ./new4.sh >&/dev/null
end
rhel7
[root@mv-rhel7 ~]# ./run.sh
maxproc 1024
173.945u 238.762s 2:10.34 316.6% 0+0k 0+239696io 0pf+0w
maxproc 2048
173.732u 238.793s 2:12.82 310.5% 0+0k 0+239624io 0pf+0w
maxproc 3072
174.404u 244.343s 2:16.82 306.0% 0+0k 0+239672io 0pf+0w
maxproc 4096
173.690u 241.411s 2:13.47 311.0% 0+0k 0+239720io 0pf+0w
maxproc 5120
173.450u 234.367s 2:15.32 301.3% 0+0k 0+239544io 0pf+0w
maxproc 6144
175.205u 236.809s 2:15.85 303.2% 0+0k 0+239720io 0pf+0w
maxproc 7168
173.465u 232.450s 2:11.56 308.5% 0+0k 0+239552io 0pf+0w
rhel6
[root@mv-rhel6 ~]# ./run.sh
maxproc 1024
183.297u 113.916s 1:24.53 351.5% 0+0k 0+0io 0pf+0w
maxproc 2048
187.895u 117.672s 1:26.84 351.8% 0+0k 0+0io 0pf+0w
maxproc 3072
171.392u 102.928s 1:18.47 349.5% 0+0k 0+0io 0pf+0w
maxproc 4096
173.254u 104.116s 1:19.34 349.5% 0+0k 0+0io 0pf+0w
maxproc 5120
171.319u 102.730s 1:18.53 348.9% 0+0k 0+0io 0pf+0w
maxproc 6144
179.279u 109.541s 1:22.44 350.3% 0+0k 0+0io 0pf+0w
maxproc 7168
167.265u 99.288s 1:16.38 348.9% 0+0k 0+0io 0pf+0w
Recompiling bash-3.2 on rhel6, I see the same behaviour as for the native rhel6-bash we ship: $ for i in 16 32 64 128 256 512 768 1024 2048 4096 8192 16284; do ulimit -u $i ; echo -ne "$i : ";/usr/bin/time ./new4_bash3.2.sh 2>&1|grep elaps|sed -e 's,elaps.*,,' -e 's,.* ,,'; done 16 : 0:07.35 32 : 0:09.41 64 : 0:09.49 128 : 0:07.33 256 : 0:09.20 512 : 0:08.65 768 : 0:09.54 1024 : 0:10.40 2048 : 0:16.25 4096 : 0:22.31 8192 : 0:28.86 16284 : 0:28.68 => So it's rather not the bash, but kernel or glibc. Could please someone suggest how to proceed best here to pin this down? I think this is a degradation which might already hit many customers. The reason why on RHEL-6 with limited nproc the performance is good is because what the script does is just running thousands of sh child processes in parallel. In case of the low nproc limit it will limit the number of the child processes which of course improves run time of the script because the number of processes to run will be lower. But the real reason why the startup is slower on RHEL-6 than on RHEL-5 is unknown. Do you mean processes to be placed in background? That is not the case. To verify I was running
$ while :; do ps ax|wc -l >>process_counter_1024.txt; sleep .1; done
while running the testcode. In both cases the code only runs few additional processes, nowhere in the area of 'ulimit -u':
[root@rhel6u5a ~]# sort process_counter_1024.txt|uniq -c
51 106
62 108
26 109
4 111
[root@rhel6u5a ~]# sort process_counter_7810.txt|uniq -c
49 106
148 108
41 109
4 111
[root@rhel6u5a ~]#
I was discussing this with a Sr Engineer friend of mine, and the following topic was brought; Did we test how RHEL5 behaves with the higher limits? - RF. rhel6.5 "ulimit -u 7810" average runtime over 4 runs: 21.4sec rhel5.11 "ulimit -u 7810" average runtime over 9 runs: 12.7sec rhel6.5 "ulimit -u 1024" average runtime over 12 runs: 10.2sec rhel5.11 "ulimit -u 1024" average runtime over 5 runs: 12.6sec To isolate whether the problem is in kernel or in shell or glibc it would be really useful to have an reproducer that is using a C code and not a shell. (In reply to Tomas Mraz from comment #38) > To isolate whether the problem is in kernel or in shell or glibc it would be > really useful to have an reproducer that is using a C code and not a shell. Thanks for pointing out a possible direction.. but atleast I have no idea how to come up with C code also showing the issue. As far as I understand, such c-code, also showing the same behaviour, would probably do the same interactions with glibc and kernel (or ideally a subset, which also shows the performance degradation). The canonical c-code one would think of is executed equally fast in both environments:
----
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int i, myvar, myvar2;
for (i=1; i<2000; i++) {
myvar=system("date >/dev/null");
system("/bin/sh &");
}
return 0;
}
----
It then really looks like a bash issue and not kernel or glibc. Christian, Let's not give up quite so quickly. Have you tried your reproducer in ksh and ash and (PickYourFavorite)sh yet? That seems like the next most logical step. If any other shells have the same problem, maybe we should look at your C reproducer. Also, have we profiled the shell with this running to see if there's one thing it's getting stuck in? This has been going on long enough, I forget what's been tried and what hasn't. Thanks Tom.
Runtimes for the code with other shells:
# code runtimes w/ 'ulimit -u 1024' environment
bash: min:09.57 max:10.47 samples:20 average: 9.77
ksh: min:07.34 max:09.28 samples:20 average: 8.22
dash: min:07.47 max:09.24 samples:20 average: 8.21
ash: min:07.21 max:08.91 samples:20 average: 8.04
# code runtimes w/ 'ulimit -u 7810' environment
bash: min:19.59 max:22.22 samples:20 average: 20.10
ksh: min:09.02 max:11.43 samples:20 average: 10.01
dash: min:07.38 max:09.31 samples:20 average: 7.85
ash: min:07.15 max:08.99 samples:20 average: 7.83
=> bash is drastically slower in the "-u 7810" environment
=> ksh is a bit slower, dash and ash/busybox seem not affected
> Also, have we profiled the shell with this running to see
> if there's one thing it's getting stuck in?
I do not think so, and I am not able to do that/sure what you refer to. Any bare metal system or virtual guest can be used to see the issue.
(In reply to Christian Horn from comment #43) > Thanks Tom. > > Runtimes for the code with other shells: > > [...] Now there's some good data right there, I tell you what. Will the RHEL6 bash run on RHEL5 or vice versa? If so, I'd love to see what happens when you do the same stuff with the RHEL6 bash on a RHEL5 system (and to a lesser extent a RHEL5 bash on a RHEL6 system, but the other way around is probably more interesting). It's good to see that whatever the RHEL6 bash is doing will bring the problem out (hopefully that will help find the problem), but if the RHEL6 bash only does it on a RHEL6 system and not when placed on a RHEL5 system, it's probably not bash causing it. In that case, bash is probably just shedding light on whatever the real problem is in RHEL6 in ways that the other shells don't. I'll try to fool with that if I can get to it, but I thought I'd put it out there in case someone else can get to it first. > > Also, have we profiled the shell with this running to see > > if there's one thing it's getting stuck in? > I do not think so, and I am not able to do that/sure what you > refer to. Any bare metal system or virtual guest can be used to > see the issue. OK, I will try to figure out how to do that. I remember it's not terribly difficult, but I don't remember how to do it off the top of my head. Run couple more tests on my laptop so tested recreator on fedora 22; 4.3.0-0.rc3.git1.1.fc24.x86_64; bash-4.3.42-1.fc22.x86_64 rhel5; 2.6.18-164.9.1.el5; bash-3.2-33.el5_11.4 rhel6; 2.6.32-573.3.1.el6.x86_64; bash-4.1.2-33.el6.x86_64 rhel7; 3.10.0-302.el7.test.x86_64; bash-4.2.46-19.el7.x86_64 which is laptop(f22) and VMs with 4vcpus;4G each the noticable difference is in number of pagefaults that occur during the run rhel5 1024 13907 root 23 0 64112 1352 932 S 0.0 0.0 0:04.65 new3.sh 13.91user 19.63system 0:21.91elapsed 153%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7478505minor)pagefaults 0swaps 2048 1459 root 25 0 64112 1352 932 S 0.0 0.0 0:04.45 new3.sh 13.83user 18.74system 0:21.03elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7476621minor)pagefaults 0swaps 3072 21489 root 25 0 64112 1352 932 S 0.0 0.0 0:04.48 new3.sh 13.93user 19.06system 0:21.31elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7494993minor)pagefaults 0swaps 4096 9068 root 25 0 64112 1356 932 S 0.0 0.0 0:04.57 new3.sh 14.08user 19.10system 0:21.61elapsed 153%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7467911minor)pagefaults 0swaps 5120 29071 root 20 0 64112 1356 932 S 0.0 0.0 0:04.28 new3.sh 13.73user 18.78system 0:21.13elapsed 153%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7496351minor)pagefaults 0swaps 6144 16650 root 25 0 64112 1356 932 S 0.0 0.0 0:04.37 new3.sh 13.63user 19.13system 0:21.23elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7468460minor)pagefaults 0swaps 7168 4229 root 24 0 64112 1356 932 S 0.0 0.0 0:04.39 new3.sh 13.65user 19.04system 0:21.13elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7492852minor)pagefaults 0swaps rhel6 1024 1473 root 20 0 103m 1632 1168 S 0.0 0.2 0:07.93 new3.sh 17.14user 21.45system 0:26.06elapsed 148%CPU (0avgtext+0avgdata 1632maxresident)k 576inputs+0outputs (4major+8998255minor)pagefaults 0swaps 2048 21476 root 20 0 103m 1840 1168 S 0.0 0.2 0:08.64 new3.sh 20.49user 22.92system 0:29.68elapsed 146%CPU (0avgtext+0avgdata 1840maxresident)k 0inputs+0outputs (0major+9373339minor)pagefaults 0swaps 3072 9070 root 20 0 104m 2076 1168 S 0.0 0.2 0:10.59 new3.sh 25.16user 26.97system 0:35.08elapsed 148%CPU (0avgtext+0avgdata 2076maxresident)k 0inputs+0outputs (0major+9721730minor)pagefaults 0swaps 4096 29081 root 20 0 104m 2228 1168 S 0.0 0.3 0:10.90 new3.sh 26.09user 26.09system 0:36.83elapsed 141%CPU (0avgtext+0avgdata 2228maxresident)k 0inputs+0outputs (0major+9975488minor)pagefaults 0swaps 5120 16683 root 20 0 104m 2432 1168 S 0.0 0.3 0:11.22 new3.sh 27.40user 25.96system 0:38.13elapsed 139%CPU (0avgtext+0avgdata 2432maxresident)k 0inputs+0outputs (0major+10152760minor)pagefaults 0swaps 6144 4257 root 20 0 104m 2588 1168 S 0.0 0.3 0:11.49 new3.sh 28.04user 26.82system 0:39.00elapsed 140%CPU (0avgtext+0avgdata 2588maxresident)k 0inputs+0outputs (0major+10298784minor)pagefaults 0swaps 7168 24260 root 20 0 104m 2760 1168 S 0.0 0.3 0:13.54 new3.sh 31.27user 32.36system 0:44.71elapsed 142%CPU (0avgtext+0avgdata 2760maxresident)k 0inputs+0outputs (0major+10423502minor)pagefaults 0swaps rhel7 1024 16.24user 42.79system 0:38.68elapsed 152%CPU (0avgtext+0avgdata 1904maxresident)k 0inputs+0outputs (0major+8906379minor)pagefaults 0swaps 2048 19.36user 34.74system 0:31.94elapsed 169%CPU (0avgtext+0avgdata 1916maxresident)k 0inputs+0outputs (0major+9319956minor)pagefaults 0swaps 3072 22.79user 35.96system 0:35.67elapsed 164%CPU (0avgtext+0avgdata 2132maxresident)k 0inputs+0outputs (0major+9683267minor)pagefaults 0swaps 4096 24.31user 35.85system 0:36.90elapsed 163%CPU (0avgtext+0avgdata 2300maxresident)k 0inputs+0outputs (0major+9835529minor)pagefaults 0swaps 5120 26.87user 40.50system 0:44.15elapsed 152%CPU (0avgtext+0avgdata 2492maxresident)k 0inputs+0outputs (0major+10087698minor)pagefaults 0swaps 6144 27.02user 37.45system 0:40.90elapsed 157%CPU (0avgtext+0avgdata 2684maxresident)k 0inputs+0outputs (0major+10227271minor)pagefaults 0swaps 7168 27.85user 38.15system 0:42.63elapsed 154%CPU (0avgtext+0avgdata 2844maxresident)k 0inputs+0outputs (0major+10327962minor)pagefaults 0swaps f22 1024 13.00user 19.95system 0:16.55elapsed 199%CPU (0avgtext+0avgdata 4676maxresident)k 0inputs+0outputs (0major+3901010minor)pagefaults 0swaps 2048 15.61user 21.58system 0:20.48elapsed 181%CPU (0avgtext+0avgdata 4704maxresident)k 0inputs+0outputs (0major+4332896minor)pagefaults 0swaps 3072 18.24user 22.53system 0:23.79elapsed 171%CPU (0avgtext+0avgdata 4708maxresident)k 0inputs+0outputs (0major+4700756minor)pagefaults 0swaps 4096 19.02user 19.07system 0:22.31elapsed 170%CPU (0avgtext+0avgdata 4640maxresident)k 0inputs+0outputs (0major+4928832minor)pagefaults 0swaps 5120 20.71user 20.89system 0:25.10elapsed 165%CPU (0avgtext+0avgdata 4640maxresident)k 0inputs+0outputs (0major+5106694minor)pagefaults 0swaps 6144 21.10user 21.88system 0:25.41elapsed 169%CPU (0avgtext+0avgdata 4708maxresident)k 0inputs+0outputs (0major+5234781minor)pagefaults 0swaps 7168 22.42user 22.80system 0:27.29elapsed 165%CPU (0avgtext+0avgdata 4704maxresident)k 0inputs+0outputs (0major+5307505minor)pagefaults 0swaps As seen above f22 is behaving slightly better than rhel6/7 but is not as consistent as rhel5. The number of interrupts is more or less the same on rhel5 but increases with limit on rhel6+ with rhel7 having least among the others so I'd say some kernel optimization kicked in and helped a bit. I'm building 4.3.0-rc3 kernel for rhel7 and see if that improves things while waiting for kernel build I ran more tests on rhel7 recreator without MYVAR=$(date) consistent number of pagefaults and runtime 1024 7.10user 18.20system 0:07.04elapsed 359%CPU (0avgtext+0avgdata 1680maxresident)k 0inputs+0outputs (0major+4690456minor)pagefaults 0swaps 2048 7.43user 17.99system 0:07.17elapsed 354%CPU (0avgtext+0avgdata 1884maxresident)k 0inputs+0outputs (0major+4627062minor)pagefaults 0swaps 3072 8.00user 17.89system 0:07.36elapsed 351%CPU (0avgtext+0avgdata 2100maxresident)k 0inputs+0outputs (0major+4631613minor)pagefaults 0swaps 4096 8.46user 18.00system 0:07.34elapsed 360%CPU (0avgtext+0avgdata 2348maxresident)k 0inputs+0outputs (0major+4605916minor)pagefaults 0swaps 5120 9.00user 17.94system 0:07.59elapsed 354%CPU (0avgtext+0avgdata 2552maxresident)k 0inputs+0outputs (0major+4621905minor)pagefaults 0swaps 6144 8.31user 18.82system 0:07.70elapsed 352%CPU (0avgtext+0avgdata 2740maxresident)k 0inputs+0outputs (0major+4593514minor)pagefaults 0swaps 7168 9.31user 18.22system 0:07.81elapsed 352%CPU (0avgtext+0avgdata 2888maxresident)k 0inputs+0outputs (0major+4590491minor)pagefaults 0swaps recreator without bash fork; consistent number of pagefaults and runtime 1024 2.87user 9.40system 0:11.89elapsed 103%CPU (0avgtext+0avgdata 1404maxresident)k 0inputs+0outputs (0major+3436580minor)pagefaults 0swaps 2048 2.79user 9.40system 0:11.86elapsed 102%CPU (0avgtext+0avgdata 1404maxresident)k 0inputs+0outputs (0major+3436529minor)pagefaults 0swaps 3072 2.91user 10.00system 0:12.61elapsed 102%CPU (0avgtext+0avgdata 1404maxresident)k 0inputs+0outputs (0major+3436563minor)pagefaults 0swaps 4096 2.87user 9.39system 0:11.80elapsed 103%CPU (0avgtext+0avgdata 1400maxresident)k 0inputs+0outputs (0major+3456548minor)pagefaults 0swaps 5120 2.80user 9.32system 0:11.71elapsed 103%CPU (0avgtext+0avgdata 1400maxresident)k 0inputs+0outputs (0major+3436635minor)pagefaults 0swaps 6144 2.92user 9.26system 0:11.65elapsed 104%CPU (0avgtext+0avgdata 1404maxresident)k 0inputs+0outputs (0major+3436567minor)pagefaults 0swaps 7168 2.91user 9.46system 0:12.05elapsed 102%CPU (0avgtext+0avgdata 1404maxresident)k 0inputs+0outputs (0major+3436560minor)pagefaults 0swaps myvar=$(date); date &; pagefautls/runtime increasing 1024 10.17user 25.23system 0:22.30elapsed 158%CPU (0avgtext+0avgdata 1684maxresident)k 0inputs+0outputs (0major+7727301minor)pagefaults 0swaps 2048 12.98user 27.80system 0:27.27elapsed 149%CPU (0avgtext+0avgdata 1912maxresident)k 0inputs+0outputs (0major+8118169minor)pagefaults 0swaps 3072 15.42user 29.09system 0:30.52elapsed 145%CPU (0avgtext+0avgdata 2136maxresident)k 0inputs+0outputs (0major+8449217minor)pagefaults 0swaps 4096 17.56user 30.50system 0:33.99elapsed 141%CPU (0avgtext+0avgdata 2296maxresident)k 0inputs+0outputs (0major+8595903minor)pagefaults 0swaps 5120 19.66user 30.85system 0:35.99elapsed 140%CPU (0avgtext+0avgdata 2492maxresident)k 0inputs+0outputs (0major+8839141minor)pagefaults 0swaps 6144 20.95user 32.14system 0:38.45elapsed 138%CPU (0avgtext+0avgdata 2688maxresident)k 0inputs+0outputs (0major+9001505minor)pagefaults 0swaps 7168 21.25user 32.49system 0:39.03elapsed 137%CPU (0avgtext+0avgdata 2844maxresident)k 0inputs+0outputs (0major+9090931minor)pagefaults 0swaps date &; date &; consistent pagefaults; runtime increased 1024 5.47user 26.63system 0:08.87elapsed 361%CPU (0avgtext+0avgdata 1668maxresident)k 0inputs+0outputs (0major+6931712minor)pagefaults 0swaps 7168 10.96user 27.08system 0:12.75elapsed 298%CPU (0avgtext+0avgdata 3028maxresident)k 0inputs+0outputs (0major+6735389minor)pagefaults 0swaps copying 4.3 kernel over to my VM and will run some more tests rhel7+4.3 kernel behaves the same way as f22+4.3 kernel. I've tested rhel5 with newer kernel and interestingly pagefaults are still consistent and so is the runtime. [root@mv-rhel5 ~]# uname -r 3.2.70-1.el5.elrepo 1024 11.58user 41.37system 0:38.85elapsed 136%CPU (0avgtext+0avgdata 5952maxresident)k 0inputs+0outputs (0major+7677270minor)pagefaults 0swaps 2048 hrtimer: interrupt took 26992516 ns 10.71user 38.15system 0:36.91elapsed 132%CPU (0avgtext+0avgdata 5936maxresident)k 0inputs+0outputs (0major+7669473minor)pagefaults 0swaps 3072 10.98user 38.06system 0:36.53elapsed 134%CPU (0avgtext+0avgdata 5952maxresident)k 0inputs+0outputs (0major+7672588minor)pagefaults 0swaps 4096 11.12user 38.94system 0:36.98elapsed 135%CPU (0avgtext+0avgdata 5952maxresident)k 0inputs+0outputs (0major+7668470minor)pagefaults 0swaps 5120 10.14user 35.57system 0:34.24elapsed 133%CPU (0avgtext+0avgdata 5936maxresident)k 0inputs+0outputs (0major+7722695minor)pagefaults 0swaps 6144 11.13user 39.05system 0:37.18elapsed 134%CPU (0avgtext+0avgdata 5952maxresident)k 0inputs+0outputs (0major+7660873minor)pagefaults 0swaps 7168 11.13user 40.48system 0:38.03elapsed 135%CPU (0avgtext+0avgdata 5936maxresident)k 0inputs+0outputs (0major+7709850minor)pagefaults 0swaps at this point I really don't think this is in kernel and neither in bash as I've tested older bash on newer rhel and vice versa and results are the same going to take a look at glibc now Narrowed it down bit more to some change between fc7 and fc8. This was the
easiest and fastest way to test different glibc version by simply booting live CD
and running the test
[root@ftest ~]# uname -r
2.6.25-14.fc9.x86_64
[root@ftest ~]# rpm -q bash glibc kernel
bash-3.2-22.fc9.x86_64
glibc-2.8-3.x86_64
kernel-2.6.25-14.fc9.x86_64
[root@ftest ~]# cat new3.sh
#!/bin/sh
i=10000;
until test $i -eq 1; do
MYVAR=$(date)
/bin/sh &
i=$((i-1));
done
[root@ftest ~]# for u in 1024 2048 3072 4096 5120 6144 7168; do ulimit -u $u; ulimit -u; /usr/ bin/time ./new3.sh; done
1024
14.65user 15.96system 0:21.87elapsed 139%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8392447minor)pagefaults 0swaps
2048
15.82user 16.87system 0:23.40elapsed 139%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8840644minor)pagefaults 0swaps
3072
19.53user 16.79system 0:24.57elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9192419minor)pagefaults 0swaps
4096
21.75user 17.77system 0:26.53elapsed 148%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9427010minor)pagefaults 0swaps
5120
22.38user 18.12system 0:27.81elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9652333minor)pagefaults 0swaps
6144
22.85user 20.13system 0:29.65elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9819271minor)pagefaults 0swaps
7168
23.76user 21.28system 0:31.33elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9944758minor)pagefaults 0swaps
[root@localhost ~]# uname -r
2.6.23.1-42.fc8
[root@localhost ~]# rpm -q bash glibc kernel
bash-3.2-18.fc8
glibc-2.7-2
glibc-2.7-2
kernel-2.6.23.1-42.fc8
[root@localhost ~]# cat new3.sh
#!/bin/sh
i=10000;
until test $i -eq 1; do
MYVAR=$(date)
/bin/sh &
i=$((i-1));
done
[root@localhost ~]# for u in 1024 2048 3072 4096 5120 6144 7168; do ulimit -u $u; ulimit -u; / usr/bin/time ./new3.sh; done
1024
14.56user 17.97system 0:19.89elapsed 163%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8663077minor)pagefaults 0swaps
2048
18.72user 19.62system 0:26.97elapsed 142%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9043363minor)pagefaults 0swaps
3072
21.68user 19.67system 0:29.56elapsed 139%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9410641minor)pagefaults 0swaps
4096
22.90user 21.24system 0:32.09elapsed 137%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9625221minor)pagefaults 0swaps
5120
23.68user 22.45system 0:34.44elapsed 133%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9850296minor)pagefaults 0swaps
6144
23.77user 21.32system 0:33.16elapsed 136%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10037024minor)pagefaults 0swaps
7168
24.41user 21.47system 0:33.76elapsed 135%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10143371minor)pagefaults 0swaps
[root@localhost ~]# uname -r
2.6.21-1.3194.fc7
[root@localhost ~]# rpm -q bash glibc kernel
bash-3.2-9.fc7
glibc-2.6-3
glibc-2.6-3
kernel-2.6.21-1.3194.fc7
[root@localhost ~]# cat new3.sh
#!/bin/sh
i=10000;
until test $i -eq 1; do
MYVAR=$(date)
/bin/sh &
i=$((i-1));
done
[root@localhost ~]# for u in 1024 2048 3072 4096 5120 6144 7168; do ulimit -u $u; ulimit -u; / usr/bin/time ./new3.sh; done
1024
14.02user 16.25system 0:20.62elapsed 146%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8421929minor)pagefaults 0swaps
2048
14.41user 16.46system 0:21.51elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8448431minor)pagefaults 0swaps
3072
14.27user 16.62system 0:21.28elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8438555minor)pagefaults 0swaps
4096
14.05user 16.17system 0:20.73elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8440796minor)pagefaults 0swaps
5120
13.71user 15.69system 0:20.51elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8426738minor)pagefaults 0swaps
6144
14.26user 16.42system 0:20.80elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8434620minor)pagefaults 0swaps
7168
14.13user 16.49system 0:20.71elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8458371minor)pagefaults 0swaps
This indeed seems to be glibc problem. I've installed fc7 and run the test again after install and after yum update to latest fc7. At this point both kernel and bash have been on newer versions than base fc8 and the only thing left at older version was glibc. Performance was indeed the same. After upgrade to fc8 glibc version we see the problem. Notes below fc7 fresh install [root@ftest ~]# rpm -q bash kernel glibc bash-3.2-9.fc7 kernel-2.6.21-1.3194.fc7 glibc-2.6-3 glibc-2.6-3 (1000 iterations because it's same as last test) [root@ftest ~]# for u in 1024 2048 3072 4096 5120 6144 7168; do ulimit -u $u; ulimit -u; /usr/ bin/time ./new3.sh; done 1024 1.09user 1.24system 0:01.79elapsed 130%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+792073minor)pagefaults 0swaps 2048 1.17user 1.44system 0:01.88elapsed 139%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+793115minor)pagefaults 0swaps 3072 1.07user 1.19system 0:01.75elapsed 129%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+791090minor)pagefaults 0swaps 4096 1.10user 1.31system 0:01.85elapsed 130%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+791077minor)pagefaults 0swaps 5120 1.06user 1.17system 0:01.75elapsed 127%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+790059minor)pagefaults 0swaps 6144 1.04user 1.17system 0:01.72elapsed 128%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+791083minor)pagefaults 0swaps 7168 1.06user 1.18system 0:01.73elapsed 129%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+791054minor)pagefaults 0swaps yum update [root@ftest ~]# rpm -q bash kernel glibc bash-3.2-20.fc7 kernel-2.6.21-1.3194.fc7 kernel-2.6.23.17-88.fc7 glibc-2.6-4 glibc-2.6-4 [root@ftest ~]# for u in 1024 2048 3072 4096 5120 6144 7168; do ulimit -u $u; ulimit -u; /usr/ bin/time ./new3.sh; done 1024 14.56user 15.65system 0:19.42elapsed 155%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8546356minor)pagefaults 0swaps 2048 14.68user 16.00system 0:19.46elapsed 157%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8544516minor)pagefaults 0swaps 3072 15.02user 15.38system 0:19.44elapsed 156%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8543183minor)pagefaults 0swaps 4096 14.56user 15.61system 0:19.39elapsed 155%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8538693minor)pagefaults 0swaps 5120 15.14user 17.14system 0:21.33elapsed 151%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8537270minor)pagefaults 0swaps 6144 14.47user 16.78system 0:19.88elapsed 157%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8536516minor)pagefaults 0swaps 7168 14.59user 17.18system 0:19.77elapsed 160%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8541067minor)pagefaults 0swaps yum update to fc8 glibc Updating: glibc x86_64 2.7-2 glibc-2.7-2.x86_64.rpm 12 M glibc i686 2.7-2 glibc-2.7-2.i686.rpm 13 M glibc-common x86_64 2.7-2 glibc-common-2.7-2.x86_64.rpm 86 M [root@ftest ~]# rpm -q bash kernel glibc bash-3.2-20.fc7 kernel-2.6.21-1.3194.fc7 kernel-2.6.23.17-88.fc7 glibc-2.7-2 glibc-2.7-2 [root@ftest ~]# for u in 1024 2048 3072 4096 5120 6144 7168; do ulimit -u $u; ulimit -u; /usr/ bin/time ./new3.sh; done 1024 14.53user 16.75system 0:19.90elapsed 157%CPU (0avgtext+0avgdata 0maxresident)k 16inputs+0outputs (0major+8618544minor)pagefaults 0swaps 2048 14.77user 17.40system 0:22.21elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8996006minor)pagefaults 0swaps 3072 19.52user 16.66system 0:25.91elapsed 139%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+9369530minor)pagefaults 0swaps 4096 19.82user 19.43system 0:28.55elapsed 137%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+9591477minor)pagefaults 0swaps 5120 21.35user 21.17system 0:31.10elapsed 136%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+9786261minor)pagefaults 0swaps 6144 20.94user 19.87system 0:30.58elapsed 133%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+9978023minor)pagefaults 0swaps 7168 22.24user 19.91system 0:31.51elapsed 133%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+10119779minor)pagefaults 0swaps Looks like we went from glibc 2.6-4 straight to 2.6.90-1
* Tue Jul 31 2007 Jakub Jelinek <jakub> 2.6.90-1
- update to trunk
- private futex optimizations
- open{,at}{,64} argument checking
- ldconfig speedups
* Sun Jul 08 2007 Jakub Jelinek <jakub> 2.6-4
...
I'll poke around the git repo and see how many things changed tomorrow. my
guess is that quite a lot...
I did not have much luck trying to narrow this down. I've tried to bisect glibc directly but it does not work very well since most versions between releases do not compile and if they do they tend to crash. What's even more weird is that while glibc-2.6-4.x86_64 gives me the expected performance when I manually rebuild the srpm it does not work... I'm really not sure why. (In reply to Milos Vyletel from comment #51) > I did not have much luck trying to narrow this down. I've tried to bisect > glibc > directly but it does not work very well since most versions between releases > do > not compile and if they do they tend to crash. > > What's even more weird is that while glibc-2.6-4.x86_64 gives me the expected > performance when I manually rebuild the srpm it does not work... I'm really > not > sure why. I suggest: (1) With the old glibc that runs fast: strace -tt -fff -o fast.log ./reproducer.sh (2) With the new glibc that runs slow: strace -tt -fff -o fast.log ./reproducer.sh Then look at the logs for any serious differences in number of syscalls, syscall durations or syscall timings. You're trying to see a pattern that is causing the slowdown. Thanks for the suggestion Carlos. However since the recreator spawns tens of
thousand of threads it produces quite a few files to inspect
[root@ftest ~]# ls -1 fast/ | wc -l
19999
(9999 iterations + the script itself).
Instead I ran strace with -c option to get overall look at syscall stats. To
simplify things I've unpacked both glibc 2.6-4 and 2.7-2 locally so I don't
have to update rpms and used LD_LIBRARY_PATH to point to the one I wanted and
ran it in simple loop
for g in good bad; do
echo $g;
for u in 1024 7168; do
ulimit -u $u;
ulimit -u;
LD_LIBRARY_PATH=/glibc/$g/lib64/ /usr/bin/time /root/new3.sh;
LD_LIBRARY_PATH=/glibc/$g/lib64/ strace -tt -c /root/new3.sh;
done;
done
Anyway, here are the results
glibc 2.6-3
good
1024
15.73user 21.45system 0:22.21elapsed 167%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8384029minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
35.32 0.639919 32 19998 clone
32.05 0.580559 15 39691 13706 wait4
16.02 0.290260 1 229293 rt_sigprocmask
12.43 0.225119 11 20576 571 read
1.63 0.029522 1 30178 173 rt_sigaction
1.01 0.018340 1 20007 close
0.80 0.014534 1 19693 609 rt_sigreturn
0.73 0.013135 1 9999 pipe
0.01 0.000236 236 1 execve
0.00 0.000018 3 6 3 stat
0.00 0.000000 0 13 4 open
0.00 0.000000 0 8 fstat
0.00 0.000000 0 3 lseek
0.00 0.000000 0 15 mmap
0.00 0.000000 0 6 mprotect
0.00 0.000000 0 2 munmap
0.00 0.000000 0 7 brk
0.00 0.000000 0 1 1 ioctl
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 dup2
0.00 0.000000 0 1 getpid
0.00 0.000000 0 1 uname
0.00 0.000000 0 2 fcntl
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 1 getuid
0.00 0.000000 0 1 getgid
0.00 0.000000 0 1 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 1 getppid
0.00 0.000000 0 1 getpgrp
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 1.811642 389511 15068 total
7168
15.99user 21.24system 0:22.17elapsed 167%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8428969minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
34.99 0.631086 32 19998 clone
32.27 0.582129 15 39628 13709 wait4
16.37 0.295356 1 229219 rt_sigprocmask
12.01 0.216624 11 20597 592 read
1.35 0.024409 1 30192 187 rt_sigaction
1.19 0.021405 1 19631 636 rt_sigreturn
1.01 0.018223 1 20007 close
0.79 0.014183 1 9999 pipe
0.01 0.000249 249 1 execve
0.00 0.000038 3 13 4 open
0.00 0.000000 0 6 3 stat
0.00 0.000000 0 8 fstat
0.00 0.000000 0 3 lseek
0.00 0.000000 0 15 mmap
0.00 0.000000 0 6 mprotect
0.00 0.000000 0 2 munmap
0.00 0.000000 0 7 brk
0.00 0.000000 0 1 1 ioctl
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 dup2
0.00 0.000000 0 1 getpid
0.00 0.000000 0 1 uname
0.00 0.000000 0 2 fcntl
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 1 getuid
0.00 0.000000 0 1 getgid
0.00 0.000000 0 1 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 1 getppid
0.00 0.000000 0 1 getpgrp
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 1.803702 389347 15133 total
and glibc 2.7-2
bad
1024
15.65user 20.61system 0:21.59elapsed 167%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8476290minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
34.54 0.618362 31 19998 clone
32.36 0.579311 15 39690 13627 wait4
16.73 0.299493 1 229311 rt_sigprocmask
11.85 0.212170 10 20507 502 read
1.40 0.025021 1 20007 close
1.15 0.020631 1 19692 528 rt_sigreturn
1.15 0.020502 1 30203 198 rt_sigaction
0.82 0.014665 1 9999 pipe
0.00 0.000042 3 13 4 open
0.00 0.000000 0 6 3 stat
0.00 0.000000 0 8 fstat
0.00 0.000000 0 3 lseek
0.00 0.000000 0 15 mmap
0.00 0.000000 0 6 mprotect
0.00 0.000000 0 2 munmap
0.00 0.000000 0 6 brk
0.00 0.000000 0 1 1 ioctl
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 dup2
0.00 0.000000 0 1 getpid
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 uname
0.00 0.000000 0 2 fcntl
0.00 0.000000 0 2 getrlimit
0.00 0.000000 0 1 getuid
0.00 0.000000 0 1 getgid
0.00 0.000000 0 1 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 1 getppid
0.00 0.000000 0 1 getpgrp
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 1.790197 389483 14864 total
7168
25.41user 26.52system 0:37.14elapsed 139%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10012168minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
35.02 0.736181 37 19998 clone
29.23 0.614465 15 39805 14056 wait4
16.87 0.354757 17 20471 466 read
15.16 0.318714 1 232595 rt_sigprocmask
1.28 0.026908 1 30133 128 rt_sigaction
1.15 0.024134 1 20007 close
0.71 0.014986 1 19807 499 rt_sigreturn
0.58 0.012125 1 9999 pipe
0.00 0.000034 3 13 4 open
0.00 0.000000 0 6 3 stat
0.00 0.000000 0 8 fstat
0.00 0.000000 0 3 lseek
0.00 0.000000 0 15 mmap
0.00 0.000000 0 6 mprotect
0.00 0.000000 0 2 munmap
0.00 0.000000 0 31 brk
0.00 0.000000 0 1 1 ioctl
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 dup2
0.00 0.000000 0 1 getpid
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 uname
0.00 0.000000 0 2 fcntl
0.00 0.000000 0 2 getrlimit
0.00 0.000000 0 1 getuid
0.00 0.000000 0 1 getgid
0.00 0.000000 0 1 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 1 getppid
0.00 0.000000 0 1 getpgrp
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 2.102304 392916 15158 total
POSIX requires that the shell keeps the exit status of at least as many as CHILD_MAX subshells. CHILD_MAX is the value returned by sysconf (_SC_CHILD_MAX), and glibc returns the resource limit configured with “ulimit -u”. (This means that the returned value can change due to a setrlimit call, which is a minor POSIX incompatibility, but this is a POSIX defect—there already is an exception for “ulimit -n”.)
Anyway, bash needs to keep track of CHILD_MAX processes, and if the script never collects the exit status of those subshells, it will keep accumulating them. After “debuginfo-install bash”, you can verify this with:
$ gdb -p $(pgrep -u 500 -f ^sh.*new3) -ex "print js" -ex quit -batch
(assuming that the reproducer is invoked as “sh new3.sh” or something similar). On my RHEL 6.7 test system, GDB prints this after the reproducer has been running for a while:
$1 = {c_childmax = 7810, c_living = 1, c_reaped = 6850, c_injobs = 6850, c_totforked = 13702, c_totreaped = 13701, j_jobslots = 6856, j_lastj = 6849, j_firstj = 0, j_njobs = 6850, j_ndead = 6850, j_current = 6849, j_previous = 6849, j_lastmade = 0x2071fe0, j_lastasync = 0x2071fe0}
The performance degradation happens because bash uses a linear array, and scans it for the child PID on each SIGCHILD. This suggests that the amount of work is roughly proportional to I * min(I, CHILD_MAX), where I is the initial value of i in the reproducer (10000 in the attached version).
To verify this theory, I applied this patch:
--- bash-4.1/lib/sh/oslib.c.maxchild 2015-10-14 11:08:54.584433383 +0200
+++ bash-4.1/lib/sh/oslib.c 2015-10-14 11:09:13.063521809 +0200
@@ -275,22 +275,5 @@ getmaxgroups ()
long
getmaxchild ()
{
- static long maxchild = -1L;
-
- if (maxchild > 0)
- return maxchild;
-
-#if defined (HAVE_SYSCONF) && defined (_SC_CHILD_MAX)
- maxchild = sysconf (_SC_CHILD_MAX);
-#else
-# if defined (CHILD_MAX)
- maxchild = CHILD_MAX;
-# else
-# if defined (MAXUPRC)
- maxchild = MAXUPRC;
-# endif /* MAXUPRC */
-# endif /* CHILD_MAX */
-#endif /* !HAVE_SYSCONF || !_SC_CHILD_MAX */
-
- return (maxchild);
+ return 500;
}
This artificially caps the number of tracked subshells at 500. As a result, the accumulation of subshells is limited:
$1 = {c_childmax = 500, c_living = 2, c_reaped = 501, c_injobs = 502, c_totforked = 2063, c_totreaped = 2061, j_jobslots = 1032, j_lastj = 1030, j_firstj = 529, j_njobs = 502, j_ndead = 501, j_current = 1030, j_previous = 1030, j_lastmade = 0x941380, j_lastasync = 0x941380}
and performance is independent of “ulimit -u”.
A proper fix would have to use a different data structure to perform PID lookups (a balanced tree or a hash table). Without PID randomization, it might be possible to exploit their non-random nature and keep using the array-based data structure, but this will obviously explode again if we ever turn on PID randomization. There will always be a slowdown due to the increased bookkeeping overhead, but as long as the number of PIDs that need to be tracked is not in the range of millions, I expect that it will be hardly measurable with the proper data structure.
We could add an override to glibc to change the value returned by sysconf (_SC_CHILD_MAX), but if you want to go that route (which I don't recommend because it might break other scripts), please put that into bash.
(In reply to Christian Horn from comment #30) > On rhel5, I have identical execution times for the code. For "ulimit -u" > ranging from 16 to 16284, I get execution times between 12.36 and 12.62 > seconds. On RHEL 5, glibc is compiled against kernel headers, which contain this: /usr/include/linux/limits.h:#define CHILD_MAX 999 /* no limit :-) */ The glibc sysconf implementation assumes this an actual hard limit. As a result, sysconf (_SC_CHILD_MAX) will always return 999, and the “ulimit -u” value does not matter: bash will always track the most recent 999 subshells, and there is no performance impact of increasing “ulimit -u”. Trying to sum up my thoughts about the options for going forward, with plusses/minuses as I see them (please correct!):
1) follow up with upstream glibc
pro: I think this should be brought up in upsteam glibc in any way,
as an upstream change triggered this. I think also upstream
had "CHILD_MAX 999", so other software might also be affected.
pro: Maybe there is a chance on fixing this without touching
the scripts
con: Chance that reporting this is seen as noise
2) follow up with upstream bash
pro: Should be brought up also in bash upstream, at not only our
builds of bash had the issue. Others might still use affected
builds (like we until today).
pro: Maybe there is a chance on fixing this without touching
the scripts
3) follow up with the code, I think this has 2 parts:
a) detecting affected code:
Describing which situation is triggering the issue.
Is this a code structure which can be detected with
egrep and a regex?
Or can it ina good way be seen when the script is running?
b) If script is affected, recommending a fix for the code.
And describing the workaround with "ulimit -u".
We are only aware of the OSWatcher script so far, but it's
widely spread.
With 1) and 2) we have a chance to solve the issue most conveniently for customers. Applying the workaround with "ulimit -u" still requires that one detects that there is an issue.
I do not think 1) is a real solution and although I am not a glibc maintainer/developer I would reject any attempts to "fix" the problem this way. The truth is the script is simply broken - spawning thousands of background processes and not collecting the exit status is a bug. 2) is not a real fix but at least it can be seen as an optimization enhancement and the patch could be acceptable to bash upstream. I will go ahead and call this situation to Chet's attention, and see if maybe he wants to try to make the searches more efficient. In fact, I'll just point him to this bug report. Please don't anyone make this private or he won't be able to read it.... Thanks for bringing this to the attention of upstream glibc. Is someone doing the same for bash, if that makes sense? Review of https://access.redhat.com/solutions/1444773 summing up the issue would be appreachiated. Starting to discuss fixes of the code where this issue was observed. OK, so I wrote to Chet. He very graciously wrote back. As I should
have expected, he was way ahead of us on this. He already has plans
to fix it, and is just deciding on the best solution. Here, I'll
just let him tell you in his own words (I've snipped out mine that
he quoted from the my original email to him, in case this seems a
little choppy to you):
> [...drivel from me snipped...]
> I will take a look at this. I have one proposed patch and an idea for
> a simpler one in mind.
>
> This was first discussed back in April:
>
> http://lists.gnu.org/archive/html/bug-bash/2015-04/msg00069.html
> http://lists.gnu.org/archive/html/bug-bash/2015-04/msg00075.html
>
> I applied the portion of the patch that only stores background pids
> in the bgpids list, though that won't help in the case where some
> process just forks off thousands of background pids and doesn't do
> anything with them.
>
> [...drivel from me snipped...]
>
> Sure. A script can run `wait' without arguments, though (or `disown -a'
> if it doesn't want to possibly wait for a process to terminate), and
> discard all of those saved statuses. Doing that seems just as reasonable,
> in the short term, as forking off and ignoring thousands of background
> processes, until a change gets into a bash release.
>
> Chet
I love that disown thing he mentions. Something I don't think any
of us noticed before (at least I don't recall seeing it mentioned
here, yet).
So, I haven't yet, but I will check those mailing list references
and see if there is a BZ or something in there that we can watch.
If there isn't one, I'll ask him so that we can have something to
watch. From there, once he has made his decision and his patch,
we can then just port that into our version. Does that seem like
a reasonable action plan to everyone?
BTW, should this now be considered an RFE, or still a bug?
(In reply to Christian Horn from comment #73) > Thanks for bringing this to the attention of upstream glibc. Is someone > doing the same for bash, if that makes sense? > > Review of https://access.redhat.com/solutions/1444773 summing up the issue > would be appreachiated. > Starting to discuss fixes of the code where this issue was observed. Sorry about that. I guess I wasn't clear. Chet Ramey is one of the original authors, and still maintainer of Bash. I don't know who to contact about glibc. I'm not sure if this is something to be addressed in glibc or not, really. I have no idea what glibc _should_ be doing, or even if it is doing the right thing or not. One thing is clear, though, is that the array that bash is maintaining _could_ be made to work more efficiently, and Chet is already looking into it, and seems to want to do it, so that will certainly fix all bash scripts. The way I see it, I think the glibc folks can (and probably should) hash out what, if anything, should be done with that independently of whatever the bash folks decide. I honestly have no opinion on that. I'll look at your KCS as soon as I can, but I've already promised to do other stuff, so I'll get that done first. Turns out there's a second problem, guys: One of the things Chet points out above (though it was a little subtle for me to pick up on at first) is that this array is not only an inefficient way to keep and search for data, but it's also keeping data that it doesn't need to keep (thus making the array bigger than it has to be, thus adding to the problem). Apparently, this array has been storing information not only on background processes, but also all foreground processes as well. The information being kept about foreground processes when they exit is completely unnecessary. Just to make things delicious, apparently, the patch to fix it is a simple one-liner and he already intends to include in a future release, so it's all upstreamy and everything already. Are we interested in at least putting that in? I've explained this to the customer of case #01443177, to see if they think it will help their specific oracle process. My guess (having not seen the script, but heard things about its characteristics in this BZ and the case) is that it just might actually fix their problem, and also fix other scripts that simply run for very long periods of time, spawning reasonable numbers of background processes (instead of the greatly exaggerated numbers that we've been looking at in here). Lemme go dig up the patch, hang on. Quick patch Chet sent to me which just stops saving irrelevant data (data about foreground processes) in that bgpid array:
***************
*** 1015,1022 ****
return;
! if ((dflags & DEL_NOBGPID) == 0)
{
proc = find_last_proc (job_index, 0);
- /* Could do this just for J_ASYNC jobs, but we save all. */
if (proc)
bgp_add (proc->pid, process_exit_status (proc->status));
--- 1066,1072 ----
return;
! if ((dflags & DEL_NOBGPID) == 0 && (temp->flags & (J_ASYNC|J_FOREGROUND)) == J_ASYNC)
{
proc = find_last_proc (job_index, 0);
if (proc)
bgp_add (proc->pid, process_exit_status (proc->status));
[actually 2 issues] I think this fits to what we have seen. We initially had issues to describe this here in the best way, actually seeing multiple things: - having the downstripped code which was slower on rhel5 than on rhel6.2+ - seing the script after some time occupying a cpu core by itself - seeing the script in "top" accumulating cpu cycles much faster than on rhel5 The customer probably agrees on giving a testpackage a try. Clear reesults would take 1 or 2 weeks after installing the testpackage, I think. Counting exec or clone calls in the straces made on site - sample 1 2860 exec (2802 clone) calls in 920 seconds - sample 2 1287 exec (898 clone) calls in 364 seconds Has this discussion been taken up in upstream, any news? (In reply to Christian Horn from comment #82) > Has this discussion been taken up in upstream, any news? I thought I reported back on that long ago: It has been at least partially fixed upstream (the bit about not saving unnecessary data in that structure). The last time I talked to him, he still wasn't sure about which way he was going to try to approach speeding up searches, but he certainly intended to do it. That's all I know. Discussed at http://lists.gnu.org/archive/html/bug-bash/2015-04/msg00069.html Fixed in upstream through http://git.savannah.gnu.org/cgit/bash.git/commit/?id=85ec0778f9d778e1820fb8c0e3e996f2d1103b45 jobs.c - delete_job: only add J_ASYNC (and not J_FOREGROUND) jobs to the bgpids array, since Posix says shells only need to save the status of async jobs. Report and fix from John Fremlin <john> Related change: 85ec0778 (Chet Ramey 2015-05-15 11:03:36 -0400 1155) if ((dflags & DEL_NOBGPID) == 0 && (temp->flags & (J_ASYNC|J_FOREGROUND)) == J_ASYNC) |