Bug 1653942
Summary: | glibc: Test suite failure: failure in nptl/tst-pthread-getattr on s390x | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Carlos O'Donell <codonell> |
Component: | glibc | Assignee: | DJ Delorie <dj> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 31 | CC: | arjun.is, codonell, dj, fweimer, law, mfabian, pfrankli, rth, siddhesh |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | s390x | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-02-12 18:38:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Carlos O'Donell
2018-11-27 19:48:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to '31'. Reproduces on RHEL 7.7 (kernel 3.10); just keep looping until failure and the failing case looks like this (map dump added just before line that coredumps). Note that the failing case is the only one that lists [heap] before [stack]; all other cases put the heap after the stack, so yeah, it sounds like the same kind of kernel ASLR bug we've seen before. Does not reproduce on F32 (kernel 5.3) where the [heap] is in a completely different segment of the memory map. Verifying that stack top is accessible current rlimit_stack is 8388608 Adjusting RLIMIT_STACK to 7684097 Adjusted rlimit: stacksize=7680000, stackaddr=0x3ffff406000 01000000-01004000 r-xp 00000000 00:28 109078265 /home/nfs/dj/glibc.build/nptl/tst-pthread-getattr 01004000-01005000 r--p 00003000 00:28 109078265 /home/nfs/dj/glibc.build/nptl/tst-pthread-getattr 01005000-01006000 rw-p 00004000 00:28 109078265 /home/nfs/dj/glibc.build/nptl/tst-pthread-getattr 3fffd533000-3fffd535000 rw-p 00000000 00:00 0 3fffd535000-3fffd6c6000 r-xp 00000000 00:28 109067969 /home/nfs/dj/glibc.build/libc.so 3fffd6c6000-3fffd6c7000 ---p 00191000 00:28 109067969 /home/nfs/dj/glibc.build/libc.so 3fffd6c7000-3fffd6ca000 r--p 00191000 00:28 109067969 /home/nfs/dj/glibc.build/libc.so 3fffd6ca000-3fffd6cd000 rw-p 00194000 00:28 109067969 /home/nfs/dj/glibc.build/libc.so 3fffd6cd000-3fffd6d1000 rw-p 00000000 00:00 0 3fffd6d1000-3fffd6ee000 r-xp 00000000 00:28 109071924 /home/nfs/dj/glibc.build/nptl/libpthread.so 3fffd6ee000-3fffd6ef000 r--p 0001c000 00:28 109071924 /home/nfs/dj/glibc.build/nptl/libpthread.so 3fffd6ef000-3fffd6f0000 rw-p 0001d000 00:28 109071924 /home/nfs/dj/glibc.build/nptl/libpthread.so 3fffd6f0000-3fffd6f6000 rw-p 00000000 00:00 0 3fffd6f6000-3fffd71a000 r-xp 00000000 00:28 109067968 /home/nfs/dj/glibc.build/elf/ld.so 3fffd71b000-3fffd71c000 r--p 00024000 00:28 109067968 /home/nfs/dj/glibc.build/elf/ld.so 3fffd71c000-3fffd71d000 rw-p 00025000 00:28 109067968 /home/nfs/dj/glibc.build/elf/ld.so 3fffd71d000-3fffd71e000 rw-p 00000000 00:00 0 3ffff3e3000-3ffff404000 rw-p 00000000 00:00 0 [heap] 3ffffb39000-3ffffb5a000 rw-p 00000000 00:00 0 [stack] Segmentation fault (core dumped) $ $ while env GCONV_PATH=/home/nfs/dj/glibc.build/iconvdata \ LOCPATH=/home/nfs/dj/glibc.build/localedata LC_ALL=C \ /home/nfs/dj/glibc.build/elf/ld64.so.1 --library-path \ /home/nfs/dj/glibc.build:/home/nfs/dj/glibc.build/math:/home/nfs/dj/glibc.build/elf:/home/nfs/dj/glibc.build/dlfcn:/home/nfs/dj/glibc.build/nss:/home/nfs/dj/glibc.build/nis:/home/nfs/dj/glibc.build/rt:/home/nfs/dj/glibc.build/resolv:/home/nfs/dj/glibc.build/mathvec:/home/nfs/dj/glibc.build/support:/home/nfs/dj/glibc.build/crypt:/home/nfs/dj/glibc.build/nptl /home/nfs/dj/glibc.build/nptl/tst-pthread-getattr --direct ; do echo;echo;echo; done (In reply to DJ Delorie from comment #3) > Reproduces on RHEL 7.7 (kernel 3.10); just keep looping until failure > and the failing case looks like this (map dump added just before line > that coredumps). Note that the failing case is the only one that lists > [heap] before [stack]; all other cases put the heap after the stack, so > yeah, it sounds like the same kind of kernel ASLR bug we've seen before. > Does not > reproduce on F32 (kernel 5.3) where the [heap] is in a completely > different segment of the memory map. Thank you for digging into this. Yes, it looks like our suspicions were correct. Can you please file a kernel bug for this then? It should be a clean bug with a clean reproducer for the kernel team. Then we can close this as a duplicate of the clean kernel bug (links the two bugs). So if we run into this again we should be able to find it given the description here. I'm not 100% sure the root cause is the kernel. With more research I've come up with a standalone version of the test with additional debug info: $ while /lib64/ld-2.17.so ./1653942.x ; do true; done You eventually get this: current rlimit_stack is 8388608 Adjusting RLIMIT_STACK to 7806977 Adjusted rlimit: stacksize=7802880, stackaddr=0x3ffff5d4000 80000000-80002000 r-xp 00000000 00:28 107611085 /home/nfs/dj/1653942.x 80002000-80003000 r--p 00001000 00:28 107611085 /home/nfs/dj/1653942.x 80003000-80004000 rw-p 00002000 00:28 107611085 /home/nfs/dj/1653942.x 3fffcf50000-3fffcf52000 rw-p 00000000 00:00 0 3fffcf52000-3fffd101000 r-xp 00000000 fd:01 67235395 /usr/lib64/libc-2.17.so 3fffd101000-3fffd105000 r--p 001ae000 fd:01 67235395 /usr/lib64/libc-2.17.so 3fffd105000-3fffd107000 rw-p 001b2000 fd:01 67235395 /usr/lib64/libc-2.17.so 3fffd107000-3fffd10b000 rw-p 00000000 00:00 0 3fffd10b000-3fffd123000 r-xp 00000000 fd:01 67289710 /usr/lib64/libpthread-2.17.so 3fffd123000-3fffd124000 r--p 00017000 fd:01 67289710 /usr/lib64/libpthread-2.17.so 3fffd124000-3fffd125000 rw-p 00018000 fd:01 67289710 /usr/lib64/libpthread-2.17.so 3fffd125000-3fffd129000 rw-p 00000000 00:00 0 3fffd142000-3fffd145000 rw-p 00000000 00:00 0 3fffd145000-3fffd168000 r-xp 00000000 fd:01 67494691 /usr/lib64/ld-2.17.so 3fffd168000-3fffd169000 rw-p 00000000 00:00 0 3fffd169000-3fffd16a000 r--p 00023000 fd:01 67494691 /usr/lib64/ld-2.17.so 3fffd16a000-3fffd16b000 rw-p 00024000 fd:01 67494691 /usr/lib64/ld-2.17.so 3fffd16b000-3fffd16c000 rw-p 00000000 00:00 0 3ffff5b1000-3ffff5d2000 rw-p 00000000 00:00 0 [heap] 3ffffd25000-3ffffd46000 rw-p 00000000 00:00 0 [stack] sbrk() = 0x3ffff5d2000, testing 0x3ffff5d4800 getpagesize = 0x1000 Segmentation fault Note that the sbrk() corresponds to the top of [heap], and that the testing addr is 1.5 pages away from it. Florian notes that on older kernels, single threaded apps have a larger stack guard - perhaps 1 Mb instead of 4k. Running the test *without* ld-2.17.so (i.e. directly) always puts the heap in the 0x80000000 range with the rest of the program, so this conflict never happens for normal programs. Conclusions so far: 1. The failure is dependent on our test harness (using ld.so directly) 2. The testsuite assumes a one-page stack guard, which is not always a valid assumption. 3. The kernel puts [heap] and [stack] closer together than RLIMIT_STACK plus guard space [hc] 0x3ffffd46000-0x3ffff5b1000 [1] 7,950,336 0x79_5000 036250000 0111.1001.0101.0000.0000.0000 4. pthread_attr_getstack() does not take into account the larger stack guard. So I can't say for sure that the kernel is at fault - the test plus glibc plus kernel simply disagree on their assumptions. We committed a change to upstream master to move this test into a container to avoid running it directly underneath the dynamic loader. commit 279c68ce1336d84d82ce491a4b77086e574ba380 Author: DJ Delorie <dj> Date: Mon Feb 3 14:57:23 2020 -0500 Run nptl/tst-pthread-getattr in a container See https://bugzilla.redhat.com/show_bug.cgi?id=1653942 This test depends on the kernel's assignment of memory regions, but running under ld.so explicitly changes those assignments, sometimes sufficiently to cause the test to fail (esp with address space randomization). The easiest way to "fix" the test, is to run it the way the user would - without ld.so. Running it in a container does that. Reviewed-by: Carlos O'Donell <carlos> This ensures the kernel gives the process the correct VMA layout for a dynamically linked process. We are going to sync Fedora Rawhide to master and that closes this bug. I'm going to mark this CLOSED/RAWHIDE since this is effectively done. |