time-1.9-20.fc39 fails to build in Fedora 39 in 64-bit PowerPC because a test fails there: ==================================== GNU Time 1.9: ./test-suite.log ==================================== # TOTAL: 4 # PASS: 3 # SKIP: 0 # XFAIL: 0 # FAIL: 1 # XPASS: 0 # ERROR: 0 .. contents:: :depth: 2 FAIL: tests/time-max-rss ======================== time(1) failed to detect 5MB allcoation. mem-baseline(kb): 0 mem-5MB(kb): 4096 delta(kb): 4096 FAIL tests/time-max-rss.sh (exit status: 1) A difference between passing and failing build root is at <https://koschei.fedoraproject.org/build/15196718>. Reproducible: Always
This looks like a bug in kernel. Compare ru_maxrss on x86_64: $ strace -v -e wait4 -- time -f %M /usr/bin/true wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=522}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=1152, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=66, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=0}) = 12931 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12931, si_uid=500, si_status=0, si_utime=0, si_stime=0} --- 1152 +++ exited with 0 +++ While on ppc64le: # strace -v -e wait4 -- ./time -f %M /usr/bin/true wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=596}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=0, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=47, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 23827 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23827, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 0 +++ exited with 0 +++ # uname -r 6.4.0-0.rc4.20230601git929ed21dfdb6.38.fc39.ppc64le
It looks like ru_maxrss value is rounded down to nearest 2048 KB: [root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 1M wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=745}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=0, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=70, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=0}) = 24895 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24895, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 0 +++ exited with 0 +++ [root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 2M wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=2891}, ru_maxrss=2048, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=86, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24900 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24900, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 2048 +++ exited with 0 +++ [root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 3M wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=965}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=2048, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=102, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24907 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24907, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 2048 +++ exited with 0 +++ [root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 4M wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=1995}, ru_maxrss=4096, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=118, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24914 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24914, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 4096 +++ exited with 0 +++ [root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 5M wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=1199}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=4096, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=135, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24919 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24919, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 4096 +++ exited with 0 +++ [root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 6M wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=3769}, ru_maxrss=6144, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=150, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24926 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24926, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 6144 +++ exited with 0 +++
Hi Dan, Could you take a look at this for us (kernel-ark team) when you get a chance? Thank you. Scott
If it's a kernel issue, then it's more likely a runtime, not buildtime, kernel version that changed. The time test-suite broke on 03/31 if I see right, so we need to check what kernel was used on the builders at that time. It could be a 6.1 -> 6.2 migration ... FWIW the time-max-rss.sh test is skipped on my F-36 with kernel-6.4.0-0.rc3.28.fc39.ppc64le from the time-1.9/tests/time-max-rss.log time-max-rss.sh: skipped test: getrusage(2) returns zero in ru_maxrss SKIP tests/time-max-rss.sh (exit status: 77) So it might be more complicated ...
Created attachment 1972244 [details] Reproducer Instead of using time's package test suite, this short C program should demonstrate the problem. Though, I haven't run it on PowerPC yet.
[dan@talos tmp]$ gcc -O2 -Wall -o test test.c [dan@talos tmp]$ ./test Child mallocated 3145728 B. wait4() reported 0 B. Which is fewer than the allocated memory. This is on F-36 with kernel-6.4.0-0.rc3.28.fc39.ppc64le, I guess this means the reproducer works.
with kernel-6.0.7-301.fc37.ppc64le (P8 KVM guest) I am getting [root@ibm-p8-storage-02-guest-10 ~]# ./test Child mallocated 3145728 B. wait4() reported 2293760 B. Which is fewer than the allocated memory.
If I build it without optimizations, it prints nonzero, yet too small number: # gcc -O0 -Wall -o test test.c # ./test Child mallocated 3145728 B. wait4() reported 2097152 B. Which is fewer than the allocated memory. # uname -r 6.4.0-0.rc7.20230621gite660abd551f1.55.fc39.ppc64le While with -O2 it reports 0 on the same kernel. Do I have a bug in my reproducer? Or is the compiler too smart and optimizes out a for loop which forces mapping pages into resident set size? Or is there a bug in glibc wrapper for the syscall? wait4() is known to always return 0 on some platforms where process accounting is not fully implemented. That's why the test skips in case of 0. I don't know what whas the situation with Linux on PowerPC.
and with kernel-6.3.8-100.fc37.ppc64le on the same guest Child mallocated 3145728 B. wait4() reported 0 B. Which is fewer than the allocated memory.
Hi Michael, seems we have some problems with the maxrss value returned by wait4() on recent ppc64le kernels. Or perhaps on the glibc side of the syscall ... Could you or someone else on the IBM side take a look? Thanks, Dan
Thanks Michael for the link - https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should be the fix. If I see right, then 6.2 is the first kernel where things broke (started to behave differently) compared to previous kernels.
(In reply to Dan Horák from comment #11) > Thanks Michael for the link - > https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should > be the fix. If I see right, then 6.2 is the first kernel where things broke > (started to behave differently) compared to previous kernels. Hi Dan, Correct me if I'm missing something but it doesn't look like this patch was accepted. Was there a v2 somewhere that I'm not finding?
Hi Scott, (In reply to Scott Weaver from comment #12) > (In reply to Dan Horák from comment #11) > > Thanks Michael for the link - > > https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should > > be the fix. If I see right, then 6.2 is the first kernel where things broke > > (started to behave differently) compared to previous kernels. > > Hi Dan, > > Correct me if I'm missing something but it doesn't look like this patch was > accepted. Was there a v2 somewhere that I'm not finding? I suspect you are right that there hasn't been any new version posted yet ...
the issue is still present in kernel-6.7.0-0.rc1.16.fc40.ppc64le, but the failing test in time got worked around (disabled) in https://src.fedoraproject.org/rpms/time/c/03e009c1bd39515cc5882508871a7598c384b173?branch=rawhide
Dear Maintainer, your package has an open Fails To Build From Source bug for Fedora 39. Action is required from you. If you can fix your package to build, perform a build in koji, and either create an update in bodhi, or close this bug without creating an update, if updating is not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to acknowledge this. If you have already fixed this issue, please close this Bugzilla report. Following the policy for such packages [2], your package will be orphaned if this bug remains in NEW state more than 8 weeks (not sooner than 2023-08-01). A week before the mass branching of Fedora 40 according to the schedule [3], any packages not successfully rebuilt at least on Fedora 38 will be retired regardless of the status of this bug. [1] https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/ [2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/ [3] https://fedorapeople.org/groups/schedule/f-40/f-40-key-tasks.html
I am going to remove the FTBFS blockers, as the package is building right now, but the bug needs to stay open until the kernel fix makes it upstream.