Bug 1568337

Summary: kernel: prlimit64 with RLIMIT_CPU ignored
Product: [Fedora] Fedora Reporter: ksson <johannes.kanig>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 28CC: airlied, aoliva, bskeggs, davejohansen, dmalcolm, ewk, fweimer, hdegoede, ichavero, itamar, jakub, jarodwilson, jglisse, johannes.kanig, john.j5live, jonathan, josef, jwakely, kernel-maint, labbott, law, linville, mchehab, mjg59, mpolacek, msebor, nickc, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-4.16.5-200.fc27 kernel-4.16.5-300.fc28 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-30 16:37:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace of the working machine
none
strace of the non-working case none

Description ksson 2018-04-17 09:07:44 UTC
Description of problem: 

The following C program should be killed after 2 CPU seconds, because of the call to setrlimit. But it does not get killed and continues to run forever:

$ cat loop.c
#include <sys/resource.h>
void main() {
  struct rlimit res;
  /* set the CPU time limit */
  getrlimit(RLIMIT_CPU,&res);
  res.rlim_cur = 2;
  res.rlim_max = 2;
  setrlimit(RLIMIT_CPU,&res);

  while (1);
}

Version-Release number of selected component (if applicable):

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.0.1 20180324 (Red Hat 8.0.1-0.20) (GCC)


How reproducible: 


Steps to Reproduce:
1. $ gcc -o loop loop.c
2. $ ./loop

Actual results:

The program loops forever.

Expected results:

The program should be killed after roughly two seconds. On another system:

$ time ./loop 
Killed

real	0m1.999s
user	0m2.000s
sys	0m0.000s

Additional info:

Comment 1 Jakub Jelinek 2018-04-17 09:12:28 UTC
It gets killed for me.  And I fail to see what this has to do with gcc.

Comment 2 Florian Weimer 2018-04-17 09:13:57 UTC
Please run both cases under strace and show us the output.

Please also list kernel versions (RPM and uname -a output) and glibc versions for both systems.

Comment 3 ksson 2018-04-17 09:23:47 UTC
Created attachment 1422963 [details]
strace of the working machine

Comment 4 ksson 2018-04-17 09:24:35 UTC
Created attachment 1422965 [details]
strace of the non-working case

Comment 5 ksson 2018-04-17 09:26:18 UTC
Sorry if I selected the wrong component. Is it kernel-related then?

If it helps, the system was upgraded recently from Fedora 27.

broken system extra info:

$ uname -a
Linux kosystem 4.16.2-300.fc28.x86_64 #1 SMP Thu Apr 12 14:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ rpm -qa kernel
kernel-4.16.0-300.fc28.x86_64
kernel-4.15.15-300.fc27.x86_64
kernel-4.16.2-300.fc28.x86_64
kanig@seoul:~/tickets/timeout (master) $ rpm -qa glibc
glibc-2.27-8.fc28.i686
glibc-2.27-8.fc28.x86_64

The working system is a debian:

$ uname -a
Linux kosystem 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) x86_64 GNU/Linux

Comment 6 Florian Weimer 2018-04-17 13:02:42 UTC
I can reproduce with kernel-4.16.2-300.fc28.x86_64, but not kernel-4.15.16-300.fc27.x86_64, with a Fedora 27 userland (i.e., the system call is prlimit64, not setrlimit).

Comment 7 Florian Weimer 2018-04-17 13:22:18 UTC
I downgraded to glibc 2.24, which still uses the setrlimit system call, and it also ignores the limit.

Comment 8 Florian Weimer 2018-04-17 13:28:54 UTC
Some light testing suggests that this not apply to the hard limit, only to the soft limit, but I'm not sure how reliable those results are.

Comment 9 Laura Abbott 2018-04-17 21:33:07 UTC
Bisect points at

commit a9445e47d897054876b8f43e46dc5a3eca2b844d
Author: Max R. P. Grossmann <m>
Date:   Mon Jan 8 20:01:57 2018 +0100

    posix-cpu-timers: Make set_process_cpu_timer() more robust
    
    Because the return value of cpu_timer_sample_group() is not checked,
    compilers and static checkers can legitimately warn about a potential use
    of the uninitialized variable 'now'. This is not a runtime issue as all call
    sites hand in valid clock ids.
    
    Also cpu_timer_sample_group() is invoked unconditionally even when the
    result is not used because *oldval is NULL.
    
    Make the invocation conditional and check the return value.
    
    [ tglx: Massage changelog ]
    
    Signed-off-by: Max R. P. Grossmann <m>
    Signed-off-by: Thomas Gleixner <tglx>
    Cc: john.stultz
    Link: https://lkml.kernel.org/r/20180108190157.10048-1-m@max.pm


I'll follow up upstream

Comment 10 Laura Abbott 2018-04-23 15:57:32 UTC
tglx applied the fix to his branch, this should end up coming in a 4.16.x stable release. I'll update when it does.

Comment 11 Fedora Update System 2018-04-28 16:13:42 UTC
kernel-4.16.5-200.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-09afae3bb9

Comment 12 Fedora Update System 2018-04-28 16:14:33 UTC
kernel-4.16.5-300.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-a9d6bb6a8e

Comment 13 Fedora Update System 2018-04-29 09:42:04 UTC
kernel-4.16.5-300.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-a9d6bb6a8e

Comment 14 Fedora Update System 2018-04-29 14:29:54 UTC
kernel-4.16.5-200.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-09afae3bb9

Comment 15 ksson 2018-04-30 07:51:09 UTC
Thank you. I checked that on kernel 4.16.5-300.fc28.x86_64, things are working as expected.

Comment 16 Fedora Update System 2018-04-30 16:37:17 UTC
kernel-4.16.5-200.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2018-04-30 21:18:42 UTC
kernel-4.16.5-300.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.