Bug 2212765

Summary: 6.4.0-0.rc4.20230601git929ed21dfdb6.38.fc39.ppc64le: wait4(): ru_maxrss is rounded down to multiplies of 2 MB
Product: [Fedora] Fedora Reporter: Petr Pisar <ppisar>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: acaringi, adscvr, airlied, alciregi, bskeggs, dhorak, ellerman, hdegoede, hpa, jarodwilson, josef, kasal, kernel-maint, lgoncalv, linville, masami256, mchehab, opohorel, ppisar, ptalbert, scweaver, steved
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: ppc64le   
OS: Linux   
URL: https://koschei.fedoraproject.org/package/time
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1071880, 2168842, 2231791    
Attachments:
Description Flags
Reproducer none

Description Petr Pisar 2023-06-06 09:48:06 UTC
time-1.9-20.fc39 fails to build in Fedora 39 in 64-bit PowerPC because a test fails there:

====================================
   GNU Time 1.9: ./test-suite.log
====================================

# TOTAL: 4
# PASS:  3
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: tests/time-max-rss
========================

time(1) failed to detect 5MB allcoation.
  mem-baseline(kb): 0
  mem-5MB(kb):      4096
  delta(kb):        4096
FAIL tests/time-max-rss.sh (exit status: 1)

A difference between passing and failing build root is at <https://koschei.fedoraproject.org/build/15196718>.


Reproducible: Always

Comment 1 Petr Pisar 2023-06-06 09:59:31 UTC
This looks like a bug in kernel. Compare ru_maxrss on x86_64:

$ strace -v -e wait4 -- time -f %M /usr/bin/true
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=522}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=1152, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=66, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=0}) = 12931
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12931, si_uid=500, si_status=0, si_utime=0, si_stime=0} ---
1152
+++ exited with 0 +++

While on ppc64le:

# strace -v -e wait4 -- ./time -f %M /usr/bin/true
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=596}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=0, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=47, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 23827
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23827, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
0
+++ exited with 0 +++

# uname -r
6.4.0-0.rc4.20230601git929ed21dfdb6.38.fc39.ppc64le

Comment 2 Petr Pisar 2023-06-06 10:43:14 UTC
It looks like ru_maxrss value is rounded down to nearest 2048 KB:

[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 1M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=745}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=0, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=70, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=0}) = 24895
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24895, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
0
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 2M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=2891}, ru_maxrss=2048, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=86, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24900
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24900, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
2048
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 3M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=965}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=2048, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=102, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24907
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24907, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
2048
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 4M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=1995}, ru_maxrss=4096, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=118, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24914
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24914, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
4096
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 5M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=1199}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=4096, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=135, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24919
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24919, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
4096
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 6M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=3769}, ru_maxrss=6144, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=150, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24926
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24926, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
6144
+++ exited with 0 +++

Comment 3 Scott Weaver 2023-06-20 12:30:28 UTC
Hi Dan,

Could you take a look at this for us (kernel-ark team) when you get a chance?
Thank you.

Scott

Comment 4 Dan Horák 2023-06-20 17:07:14 UTC
If it's a kernel issue, then it's more likely a runtime, not buildtime, kernel version that changed. The time test-suite broke on 03/31 if I see right, so we need to check what kernel was used on the builders at that time. It could be a 6.1 -> 6.2 migration ...

FWIW the time-max-rss.sh test is skipped on my F-36 with kernel-6.4.0-0.rc3.28.fc39.ppc64le

from the time-1.9/tests/time-max-rss.log
time-max-rss.sh: skipped test: getrusage(2) returns zero in ru_maxrss
SKIP tests/time-max-rss.sh (exit status: 77)

So it might be more complicated ...

Comment 5 Petr Pisar 2023-06-23 13:38:21 UTC
Created attachment 1972244 [details]
Reproducer

Instead of using time's package test suite, this short C program should demonstrate the problem. Though, I haven't run it on PowerPC yet.

Comment 6 Dan Horák 2023-06-23 14:00:04 UTC
[dan@talos tmp]$ gcc -O2 -Wall -o test test.c                                                                                                                                                 
[dan@talos tmp]$ ./test                                                                                                                                                                       
Child mallocated 3145728 B.
wait4() reported 0 B.
Which is fewer than the allocated memory.

This is on F-36 with kernel-6.4.0-0.rc3.28.fc39.ppc64le, I guess this means the reproducer works.

Comment 7 Dan Horák 2023-06-23 14:22:30 UTC
with kernel-6.0.7-301.fc37.ppc64le (P8 KVM guest) I am getting

[root@ibm-p8-storage-02-guest-10 ~]# ./test
Child mallocated 3145728 B.
wait4() reported 2293760 B.
Which is fewer than the allocated memory.

Comment 8 Petr Pisar 2023-06-23 14:28:01 UTC
If I build it without optimizations, it prints nonzero, yet too small number:

# gcc -O0 -Wall -o test test.c
# ./test 
Child mallocated 3145728 B.
wait4() reported 2097152 B.
Which is fewer than the allocated memory.
# uname -r
6.4.0-0.rc7.20230621gite660abd551f1.55.fc39.ppc64le

While with -O2 it reports 0 on the same kernel.

Do I have a bug in my reproducer? Or is the compiler too smart and optimizes out a for loop which forces mapping pages into resident set size? Or is there a bug in glibc wrapper for the syscall?

wait4() is known to always return 0 on some platforms where process accounting is not fully implemented. That's why the test skips in case of 0. I don't know what whas the situation with Linux on PowerPC.

Comment 9 Dan Horák 2023-06-23 14:28:41 UTC
and with kernel-6.3.8-100.fc37.ppc64le on the same guest

Child mallocated 3145728 B.
wait4() reported 0 B.
Which is fewer than the allocated memory.

Comment 10 Dan Horák 2023-06-23 14:35:38 UTC
Hi Michael,

seems we have some problems with the maxrss value returned by wait4() on recent ppc64le kernels. Or perhaps on the glibc side of the syscall ... Could you or someone else on the IBM side take a look?

Thanks, Dan

Comment 11 Dan Horák 2023-06-28 16:46:22 UTC
Thanks Michael for the link - https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should be the fix. If I see right, then 6.2 is the first kernel where things broke (started to behave differently) compared to previous kernels.

Comment 12 Scott Weaver 2023-07-21 13:48:45 UTC
(In reply to Dan Horák from comment #11)
> Thanks Michael for the link -
> https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should
> be the fix. If I see right, then 6.2 is the first kernel where things broke
> (started to behave differently) compared to previous kernels.

Hi Dan,

Correct me if I'm missing something but it doesn't look like this patch was accepted. Was there a v2 somewhere that I'm not finding?

Comment 13 Dan Horák 2023-07-24 08:03:35 UTC
Hi Scott,

(In reply to Scott Weaver from comment #12)
> (In reply to Dan Horák from comment #11)
> > Thanks Michael for the link -
> > https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should
> > be the fix. If I see right, then 6.2 is the first kernel where things broke
> > (started to behave differently) compared to previous kernels.
> 
> Hi Dan,
> 
> Correct me if I'm missing something but it doesn't look like this patch was
> accepted. Was there a v2 somewhere that I'm not finding?

I suspect you are right that there hasn't been any new version posted yet ...