Bug 2212765 - 6.4.0-0.rc4.20230601git929ed21dfdb6.38.fc39.ppc64le: wait4(): ru_maxrss is rounded down to multiplies of 2 MB
Summary: 6.4.0-0.rc4.20230601git929ed21dfdb6.38.fc39.ppc64le: wait4(): ru_maxrss is ro...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://koschei.fedoraproject.org/pac...
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2023-06-06 09:48 UTC by Petr Pisar
Modified: 2024-02-12 16:25 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)
Reproducer (1.19 KB, text/plain)
2023-06-23 13:38 UTC, Petr Pisar
no flags Details

Description Petr Pisar 2023-06-06 09:48:06 UTC
time-1.9-20.fc39 fails to build in Fedora 39 in 64-bit PowerPC because a test fails there:

====================================
   GNU Time 1.9: ./test-suite.log
====================================

# TOTAL: 4
# PASS:  3
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: tests/time-max-rss
========================

time(1) failed to detect 5MB allcoation.
  mem-baseline(kb): 0
  mem-5MB(kb):      4096
  delta(kb):        4096
FAIL tests/time-max-rss.sh (exit status: 1)

A difference between passing and failing build root is at <https://koschei.fedoraproject.org/build/15196718>.


Reproducible: Always

Comment 1 Petr Pisar 2023-06-06 09:59:31 UTC
This looks like a bug in kernel. Compare ru_maxrss on x86_64:

$ strace -v -e wait4 -- time -f %M /usr/bin/true
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=522}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=1152, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=66, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=0}) = 12931
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12931, si_uid=500, si_status=0, si_utime=0, si_stime=0} ---
1152
+++ exited with 0 +++

While on ppc64le:

# strace -v -e wait4 -- ./time -f %M /usr/bin/true
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=596}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=0, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=47, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 23827
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23827, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
0
+++ exited with 0 +++

# uname -r
6.4.0-0.rc4.20230601git929ed21dfdb6.38.fc39.ppc64le

Comment 2 Petr Pisar 2023-06-06 10:43:14 UTC
It looks like ru_maxrss value is rounded down to nearest 2048 KB:

[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 1M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=745}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=0, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=70, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=0}) = 24895
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24895, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
0
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 2M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=2891}, ru_maxrss=2048, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=86, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24900
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24900, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
2048
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 3M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=965}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=2048, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=102, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24907
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24907, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
2048
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 4M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=1995}, ru_maxrss=4096, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=118, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24914
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24914, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
4096
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 5M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=1199}, ru_stime={tv_sec=0, tv_usec=0}, ru_maxrss=4096, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=135, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24919
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24919, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
4096
+++ exited with 0 +++
[root@ibm-p9z-27-lp10 time-1.9]# strace -v -e wait4 -- ./time -f %M ./tests/time-aux -m 6M
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=3769}, ru_maxrss=6144, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=150, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=1}) = 24926
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24926, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
6144
+++ exited with 0 +++

Comment 3 Scott Weaver 2023-06-20 12:30:28 UTC
Hi Dan,

Could you take a look at this for us (kernel-ark team) when you get a chance?
Thank you.

Scott

Comment 4 Dan Horák 2023-06-20 17:07:14 UTC
If it's a kernel issue, then it's more likely a runtime, not buildtime, kernel version that changed. The time test-suite broke on 03/31 if I see right, so we need to check what kernel was used on the builders at that time. It could be a 6.1 -> 6.2 migration ...

FWIW the time-max-rss.sh test is skipped on my F-36 with kernel-6.4.0-0.rc3.28.fc39.ppc64le

from the time-1.9/tests/time-max-rss.log
time-max-rss.sh: skipped test: getrusage(2) returns zero in ru_maxrss
SKIP tests/time-max-rss.sh (exit status: 77)

So it might be more complicated ...

Comment 5 Petr Pisar 2023-06-23 13:38:21 UTC
Created attachment 1972244 [details]
Reproducer

Instead of using time's package test suite, this short C program should demonstrate the problem. Though, I haven't run it on PowerPC yet.

Comment 6 Dan Horák 2023-06-23 14:00:04 UTC
[dan@talos tmp]$ gcc -O2 -Wall -o test test.c                                                                                                                                                 
[dan@talos tmp]$ ./test                                                                                                                                                                       
Child mallocated 3145728 B.
wait4() reported 0 B.
Which is fewer than the allocated memory.

This is on F-36 with kernel-6.4.0-0.rc3.28.fc39.ppc64le, I guess this means the reproducer works.

Comment 7 Dan Horák 2023-06-23 14:22:30 UTC
with kernel-6.0.7-301.fc37.ppc64le (P8 KVM guest) I am getting

[root@ibm-p8-storage-02-guest-10 ~]# ./test
Child mallocated 3145728 B.
wait4() reported 2293760 B.
Which is fewer than the allocated memory.

Comment 8 Petr Pisar 2023-06-23 14:28:01 UTC
If I build it without optimizations, it prints nonzero, yet too small number:

# gcc -O0 -Wall -o test test.c
# ./test 
Child mallocated 3145728 B.
wait4() reported 2097152 B.
Which is fewer than the allocated memory.
# uname -r
6.4.0-0.rc7.20230621gite660abd551f1.55.fc39.ppc64le

While with -O2 it reports 0 on the same kernel.

Do I have a bug in my reproducer? Or is the compiler too smart and optimizes out a for loop which forces mapping pages into resident set size? Or is there a bug in glibc wrapper for the syscall?

wait4() is known to always return 0 on some platforms where process accounting is not fully implemented. That's why the test skips in case of 0. I don't know what whas the situation with Linux on PowerPC.

Comment 9 Dan Horák 2023-06-23 14:28:41 UTC
and with kernel-6.3.8-100.fc37.ppc64le on the same guest

Child mallocated 3145728 B.
wait4() reported 0 B.
Which is fewer than the allocated memory.

Comment 10 Dan Horák 2023-06-23 14:35:38 UTC
Hi Michael,

seems we have some problems with the maxrss value returned by wait4() on recent ppc64le kernels. Or perhaps on the glibc side of the syscall ... Could you or someone else on the IBM side take a look?

Thanks, Dan

Comment 11 Dan Horák 2023-06-28 16:46:22 UTC
Thanks Michael for the link - https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should be the fix. If I see right, then 6.2 is the first kernel where things broke (started to behave differently) compared to previous kernels.

Comment 12 Scott Weaver 2023-07-21 13:48:45 UTC
(In reply to Dan Horák from comment #11)
> Thanks Michael for the link -
> https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should
> be the fix. If I see right, then 6.2 is the first kernel where things broke
> (started to behave differently) compared to previous kernels.

Hi Dan,

Correct me if I'm missing something but it doesn't look like this patch was accepted. Was there a v2 somewhere that I'm not finding?

Comment 13 Dan Horák 2023-07-24 08:03:35 UTC
Hi Scott,

(In reply to Scott Weaver from comment #12)
> (In reply to Dan Horák from comment #11)
> > Thanks Michael for the link -
> > https://lore.kernel.org/all/20230616180718.17725-1-mkoutny@suse.com/ should
> > be the fix. If I see right, then 6.2 is the first kernel where things broke
> > (started to behave differently) compared to previous kernels.
> 
> Hi Dan,
> 
> Correct me if I'm missing something but it doesn't look like this patch was
> accepted. Was there a v2 somewhere that I'm not finding?

I suspect you are right that there hasn't been any new version posted yet ...

Comment 14 Dan Horák 2023-11-15 13:03:26 UTC
the issue is still present in kernel-6.7.0-0.rc1.16.fc40.ppc64le, but the failing test in time got worked around (disabled) in https://src.fedoraproject.org/rpms/time/c/03e009c1bd39515cc5882508871a7598c384b173?branch=rawhide

Comment 15 Fedora Release Engineering 2023-12-24 04:22:32 UTC
Dear Maintainer,

your package has an open Fails To Build From Source bug for Fedora 39.
Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. If you have already fixed this issue, please close this Bugzilla report.

Following the policy for such packages [2], your package will be orphaned if
this bug remains in NEW state more than 8 weeks (not sooner than 2023-08-01).

A week before the mass branching of Fedora 40 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 38 will be
retired regardless of the status of this bug.

[1] https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedorapeople.org/groups/schedule/f-40/f-40-key-tasks.html

Comment 16 Fedora Release Engineering 2024-01-21 04:22:26 UTC
Dear Maintainer,

your package has an open Fails To Build From Source bug for Fedora 39.
Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. If you have already fixed this issue, please close this Bugzilla report.

Following the policy for such packages [2], your package will be orphaned if
this bug remains in NEW state more than 8 weeks (not sooner than 2023-08-01).

A week before the mass branching of Fedora 40 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 38 will be
retired regardless of the status of this bug.

[1] https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedorapeople.org/groups/schedule/f-40/f-40-key-tasks.html

Comment 17 Fedora Release Engineering 2024-02-11 04:22:22 UTC
Dear Maintainer,

your package has an open Fails To Build From Source bug for Fedora 39.
Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. If you have already fixed this issue, please close this Bugzilla report.

Following the policy for such packages [2], your package will be orphaned if
this bug remains in NEW state more than 8 weeks (not sooner than 2023-08-01).

A week before the mass branching of Fedora 40 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 38 will be
retired regardless of the status of this bug.

[1] https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedorapeople.org/groups/schedule/f-40/f-40-key-tasks.html

Comment 18 Justin M. Forbes 2024-02-12 16:25:29 UTC
I am going to remove the FTBFS blockers, as the package is building right now, but the bug needs to stay open until the kernel fix makes it upstream.


Note You need to log in before you can comment on or make changes to this bug.