Bug 1463241
Summary: | rlimit_stack problems after update to 3.10.0-514.21.2.el7, and JVM Crash after updating to kernel-3.10.0-514.21.2.el7.x86_64 | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Markus Frosch <markus.frosch> | |
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | |
kernel sub component: | Memory Management | QA Contact: | Li Wang <liwan> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | ajb, akkornel, Andreas.Lehr, anrussel, aogburn, avettath, azone, bhaubeck, bloch, brian.hoppus, carnil, ccheney, chorn, christian.dengler, chrlee, cisley, cye, deryni, dhoward, diego_lozano_a, egolov, fhirtz, fweimer, gangelop, gholms, hagberg, herrold, hmadhava, hmatsumo, hpham, jaroslaw.polok, jdatta, jos100, jscalf, jualvare, kbost, klaas, knweiss, kolshanov, kperrier, loberman, lwoodman, mdshaikh, michael.friedrich, mirco.santori, mkolbas, mmilgram, onatalen, onestero, pasik, pasteur, pbokoc, pchavan, phil, p.malishev, pmatouse, pragshar, qguo, rbeyel, rbost, rcyriac, rhbugs, rickatnight11, ripleymj, rratkiewicz, sfalzara, shuwang, smeyer, sreber, stanislav.moravec, taylor.gresser, thomas.oulevey, tlavigne, toracat, trond, vagrawal, vcojot, volodymyrgl, wmealing, Yannick.Charton | |
Version: | 7.3 | Keywords: | Regression, ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-690.el7 | Doc Type: | Bug Fix | |
Doc Text: |
Prior to this update, a bug in the kernel prevented executables from starting if the maximum process stack size (rlimit_stack) was set to a value below approximately 4 MB. This update fixes the search for unmapped address ranges (suitable gap) in unmapped_area() and unmapped_area_topdown() by ensuring that the gap_end is always larger than gap_start. As a result, executables can be started with a limited process stack size as expected.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1466138 1466235 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-02 07:43:33 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1463491 | |||
Bug Blocks: | 1461333, 1463688, 1464290, 1466138, 1466235, 1466921, 1466923, 1466925, 1466927, 1504288 |
Description
Markus Frosch
2017-06-20 12:20:53 UTC
RHEL 6 seems to be fine: [root@rhel6-test ~]# uname -a Linux rhel6-test.localdomain 2.6.32-696.3.2.el6.x86_64 #1 SMP Wed Jun 7 11:51:39 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root@rhel6-test ~]# bash -c "ulimit -s 256; /bin/true; echo 'Works.'" Works. Hi, I wasn't able to access this ticket yesterday, so I've opened a CentOS ticket (https://bugs.centos.org/view.php?id=13453). I'll add my findings over here too. We have a workaround in place to Icinga, an advisory for our users is here: https://www.icinga.com/2017/06/20/advisory-for-latest-security-updates-on-rhel-7/ Still the kernel update just kills icinga2 too and requires a manual workaround patch. ### Tests 4112 kbytes works. # uname -a ; ulimit -s 4112 && /bin/true && echo "works" Linux icinga2 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux works 4111 kbytes does not (use a new shell) # uname -a ; ulimit -s 4111 && /bin/true && echo "works" Linux icinga2 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux -bash: /bin/true: Argument list too long ### Code I'm no kernel dev and don't know any specifics about the code, just know some C/C++. If my findings are wrong, just ignore them. CentOS pushed their updates and kernel sources yesterday too, so I used that as reference. Markus looked into the RHEL sources yesterday afternoon, but we don't really know if differences matter or how the code works exactly. ``` wget http://vault.centos.org/7.3.1611/updates/Source/SPackages/kernel-3.10.0-514.21.1.el7.src.rpm wget http://vault.centos.org/7.3.1611/updates/Source/SPackages/kernel-3.10.0-514.21.2.el7.src.rpm mkdir 1 2 cd 1 && rpm2cpio ../kernel-3.10.0-514.21.1.el7.src.rpm | cpio -ivd && tar xf linux-3.10.0-514.21.1.el7.tar.xz && cd .. cd 2 && rpm2cpio ../kernel-3.10.0-514.21.2.el7.src.rpm | cpio -ivd && tar xf linux-3.10.0-514.21.2.el7.tar.xz && cd .. diff -ur 1/linux-3.10.0-514.21.1.el7/ 2/linux-3.10.0-514.21.2.el7/ > diff ``` Debian patches are located here: https://anonscm.debian.org/cgit/kernel/linux.git/log/?h=jessie-security 1) The patch in RHEL/CentOS in stack_guard_area() returns with less than. ``` mm/mmap.c +int stack_guard_area(struct vm_area_struct *vma, unsigned long address) ... + return vma->vm_end - address < stack_guard_gap; ``` The patch released in Debian Jessie-Security https://anonscm.debian.org/cgit/kernel/linux.git/commit/?h=jessie-security&id=af5f37d1b8feebe4cf4976770a6c37f64de817c7 does a less than EQUAL comparison. This may or may not return different booleans. ``` ++ return vma->vm_end - address <= stack_guard_gap; ``` 2) task_mmu.c differs too. CentOS ``` diff -ur 1/linux-3.10.0-514.21.1.el7/fs/proc/task_mmu.c 2/linux-3.10.0-514.21.2.el7/fs/proc/task_mmu.c --- 1/linux-3.10.0-514.21.1.el7/fs/proc/task_mmu.c 2017-04-22 06:17:16.000000000 +0000 +++ 2/linux-3.10.0-514.21.2.el7/fs/proc/task_mmu.c 2017-05-28 20:42:06.000000000 +0000 @@ -293,11 +293,13 @@ /* We don't show the stack guard page in /proc/maps */ start = vma->vm_start; - if (stack_guard_page_start(vma, start)) - start += PAGE_SIZE; end = vma->vm_end; - if (stack_guard_page_end(vma, end)) - end -= PAGE_SIZE; + if (stack_guard_area(vma, start)) { + if (vma->vm_flags & VM_GROWSDOWN) + start += stack_guard_gap; + else + end -= stack_guard_gap; + } ``` There's no explicit check for VM_GROWSUP. Debian ``` +--- a/fs/proc/task_mmu.c ++++ b/fs/proc/task_mmu.c +@@ -276,11 +276,14 @@ show_map_vma(struct seq_file *m, struct + + /* We don't show the stack guard page in /proc/maps */ + start = vma->vm_start; +- if (stack_guard_page_start(vma, start)) +- start += PAGE_SIZE; + end = vma->vm_end; +- if (stack_guard_page_end(vma, end)) +- end -= PAGE_SIZE; ++ if (vma->vm_flags & VM_GROWSDOWN) { ++ if (stack_guard_area(vma, start)) ++ start += stack_guard_gap; ++ } else if (vma->vm_flags & VM_GROWSUP) { ++ if (stack_guard_area(vma, end)) ++ end -= stack_guard_gap; ++ } ``` 3) CentOS ``` @@ -2750,7 +2716,8 @@ return VM_FAULT_SIGBUS; /* Check if we need to add a guard page to the stack */ - if (check_stack_guard_page(vma, address) < 0) + if ((vma->vm_flags & (VM_GROWSDOWN|VM_GROWSUP)) && + expand_stack(vma, address) < 0) return VM_FAULT_SIGBUS; ``` Debian ``` +@@ -2642,8 +2610,10 @@ static int do_anonymous_page(struct mm_s + return VM_FAULT_SIGBUS; + + /* Check if we need to add a guard page to the stack */ +- if (check_stack_guard_page(vma, address) < 0) +- return VM_FAULT_SIGSEGV; ++ if (stack_guard_area(vma, address)) { ++ if (expand_stack(vma, address) < 0) ++ return VM_FAULT_SIGSEGV; ++ } ``` Debian returns a SIGSEGV instead of SIGBUS, but I think it could also be related to the difference with just checking vm_flags and not calling stack_guard_area() like Debian does. The main difference also is that stack_guard_area() checks against VM_GROWSUP only. Not sure why the CentOS patch differs so much (probably the Kernel base version behaves differently?). Kind regards, Michael Making this public as per discussion with fweimer@ Sorry Anthony, I think it did get purged. I just kicked off a new build that should be done in a couple hours. I'll post it here once its complete. Larry Yes, everything is complete now: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13501436 Yes, please share this with anyone as necessary. Larry I tested the build in a test VM, works good so far. Nothing out of the ordinary so far. Are we allowed to share this build with a mutual customer, or do they have to contact support for a build? Looking at the differences in src.rpm's for the latest releases, it seems that for el6 (696.3.2 vs 696.3.1) it has an additional change in get_arg_page() function that is absent in el7 (514.21.2 vs 514.21.1): > @@ -206,6 +206,12 @@ struct page *get_arg_page(struct linux_b > unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start; > struct rlimit *rlim; > > + /* > + * GROWSUP doesn't really have any gap at this stage because we grow > + * the stack down now. See the expand_downwards above. > + */ > + if (!IS_ENABLED(CONFIG_STACK_GROWSUP)) > + size -= stack_guard_gap; > acct_arg_size(bprm, size / PAGE_SIZE); > > /* This may explain why the problem with executing binaries with smaller limit on stack size is not present on the latest el6 kernel. There is a check lower in the function where this "size" variable is compared to the one fourth of the corresponding RLIMIT_STACK: > /* > * Limit to 1/4-th the stack size for the argv+env strings. > * This ensures that: > * - the remaining binfmt code will not run out of stack space, > * - the program will have a reasonable amount of stack left > * to work from. > */ > rlim = current->signal->rlim; > if (size > ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur) / 4) { > put_page(page); > return NULL; > } This may explain why the problem appears with stack limit of ~4 MiB and smaller: 4 MiB / 4 is compared to uncompensated "size" which should be a bit larger than "stack_guard_gap" variable (1 MiB). I've tested the same test build as Markus, and the issue seems gone. Thanks a lot for your efforts and openness to our questions! :) I'm hoping that an upstream release for all affected users (including the CentOS team) finds its way to the official channels soon :) ### Icinga 2 specific test #### Problem [root@icinga2 ~]# uname -a Linux icinga2 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root@icinga2 ~]# icinga2 daemon -C execvp: Argument list too long [root@icinga2 ~]# echo $? 1 #### Fixed [root@icinga2 ~]# uname -a Linux icinga2 3.10.0-514.el7.CVE7.3.z.x86_64 #1 SMP Wed Jun 21 20:13:13 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root@icinga2 ~]# icinga2 daemon -C information/cli: Icinga application loader (version: v2.6.3-378-g55a057c) information/cli: Loading configuration file(s). information/ConfigItem: Committing config item(s). information/ApiListener: My API identity: icinga2 warning/ApplyRule: Apply rule 'satellite-host' (in /etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere! information/ConfigItem: Instantiated 4 ApiUsers. information/ConfigItem: Instantiated 1 ApiListener. information/ConfigItem: Instantiated 3 Zones. information/ConfigItem: Instantiated 1 FileLogger. information/ConfigItem: Instantiated 1 Endpoint. information/ConfigItem: Instantiated 1 UserGroup. information/ConfigItem: Instantiated 28 Notifications. information/ConfigItem: Instantiated 2 NotificationCommands. information/ConfigItem: Instantiated 177 CheckCommands. information/ConfigItem: Instantiated 1 Downtime. information/ConfigItem: Instantiated 4 HostGroups. information/ConfigItem: Instantiated 1 IcingaApplication. information/ConfigItem: Instantiated 157 Hosts. information/ConfigItem: Instantiated 318 Comments. information/ConfigItem: Instantiated 1 User. information/ConfigItem: Instantiated 3 TimePeriods. information/ConfigItem: Instantiated 161 Services. information/ConfigItem: Instantiated 3 ServiceGroups. information/ConfigItem: Instantiated 1 ScheduledDowntime. information/ConfigItem: Instantiated 1 IdoMysqlConnection. information/ConfigItem: Instantiated 1 NotificationComponent. information/ConfigItem: Instantiated 1 GraphiteWriter. information/ConfigItem: Instantiated 1 CheckerComponent. information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' information/cli: Finished validating the configuration file(s). [root@icinga2 ~]# echo $? 0 Kurt pointed me to an ongoing discussion on oss-sec, thanks. Possible problems: http://seclists.org/oss-sec/2017/q2/562 SuSE seems affected too: http://seclists.org/oss-sec/2017/q2/563 & http://seclists.org/oss-sec/2017/q2/567 Could be related: http://seclists.org/oss-sec/2017/q2/566 -> https://patchwork.kernel.org/patch/9802797/ On Tuesday I read a German article about the issue here: https://www.heise.de/security/meldung/Stack-Clash-Schwachstelle-fuehrt-zu-Rechteausweitung-auf-Linux-und-BSD-Systemen-3748070.html which leads to a currently offline host: https://lkml.org/lkml/2017/6/19/1515 IIRC it was a discussion between Linus and Hugh about the cleanup and possible issues with the current patch. Reference to this issue: http://seclists.org/oss-sec/2017/q2/542 *** Bug 1465111 has been marked as a duplicate of this bug. *** Note: SRT, QE, zstream maintainers: Because Larry has already posted the revert of the v6 of his patchset from bug 1452733 CVE-2017-100364, and backported the upstream fix for the security issue; we plan to use this BZ for this revert, and the new upstream patch into 7.4 RC build. Move the BZ to POST. Thanks! Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-690.el7 (In reply to Rafael Aquini from comment #33) > Patch(es) committed on kernel repository and an interim kernel build is > undergoing testing Hi Rafael, would you please indicate me where can I download this kernel version? I looked for it in RHEL downloads section and in several other repositories but without success. Thanks in advance. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842 |