Bug 2066147 - brk/sbrk regression in Linux kernel 5.17.0 on AArch64: static pie binary (ex: ldconfig) segfault randomly in glibc __libc_setup_tls()
Summary: brk/sbrk regression in Linux kernel 5.17.0 on AArch64: static pie binary (ex:...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2022-03-21 04:27 UTC by Victor Stinner
Modified: 2022-04-29 07:45 UTC (History)
33 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 215720 0 None None None 2022-03-22 02:27:12 UTC
Python 47078 0 None None None 2022-03-21 04:55:12 UTC

Description Victor Stinner 2022-03-21 04:27:23 UTC
Since March 18, the test_ctypes test of the Python buildbot "aarch64 Fedora Rawhide 3.9" started to fail randomly. The test leaks a coredump when running "ldconfig -p" command. This day, glibc was upgraded by dnf to: glibc-2.35.9000-11.fc37.aarch64.

Running "ldconfig -p" in a loop does crash. Example with the shell command:

$ while true; do LC_ALL=C LANG=C /sbin/ldconfig -p > /dev/null; rc=$?; echo "$(date): $rc"; if [ $rc -ne 0 ]; then break; fi; done

Output:
---
lun. 21 mars 2022 05:18:50 CET: 0
lun. 21 mars 2022 05:18:50 CET: 0
lun. 21 mars 2022 05:18:50 CET: 0
lun. 21 mars 2022 05:18:50 CET: 0
(...)
lun. 21 mars 2022 05:18:51 CET: 0
lun. 21 mars 2022 05:18:51 CET: 0
lun. 21 mars 2022 05:18:51 CET: 0
lun. 21 mars 2022 05:18:51 CET: 0
Erreur de segmentation (core dumped)
lun. 21 mars 2022 05:18:51 CET: 139
---

gdb backtrace:
---
$ gdb /sbin/ldconfig -c .2896221
Core was generated by `/sbin/ldconfig -p'.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000aaaadfed99cc in __brk (addr=<optimized out>) at ../sysdeps/unix/sysv/linux/brk.c:39
39	      __set_errno (ENOMEM);

(gdb) where
#0  0x0000aaaadfed99cc in __brk (addr=<optimized out>) at ../sysdeps/unix/sysv/linux/brk.c:39
#1  0x0000aaaadfed9a5c in __sbrk (increment=2968) at sbrk.c:74
#2  0x0000aaaadfeb069c in __libc_setup_tls () at ../csu/libc-tls.c:151
#3  0x0000aaaadfeb02f4 in __libc_start_main_impl (main=0xaaaadfea9cb4 <_start+52>, argc=2, argv=0xfffff6badbb8, init=<optimized out>, fini=<optimized out>, rtld_fini=0x0, stack_end=<optimized out>) at ../csu/libc-start.c:304
#4  0x0000aaaadfea9cb0 in _start () at ../sysdeps/aarch64/start.S:81
---

errno is a TLS variable, but glibc failed to allocate memory for the TLS variable, so its attempts to write to NULL if I understand correctly.

---
(gdb) disassemble
Dump of assembler code for function __brk:
   0x0000aaaadfed9990 <+0>:	bti	c
   0x0000aaaadfed9994 <+4>:	mov	x1, x0
   0x0000aaaadfed9998 <+8>:	mov	x8, #0xd6                  	// #214
   0x0000aaaadfed999c <+12>:	svc	#0x0
   0x0000aaaadfed99a0 <+16>:	adrp	x2, 0xaaaadff87000 <__pthread_keys+16016>
   0x0000aaaadfed99a4 <+20>:	str	x0, [x2, #488]
   0x0000aaaadfed99a8 <+24>:	cmp	x0, x1
   0x0000aaaadfed99ac <+28>:	b.cc	0xaaaadfed99b8 <__brk+40>  // b.lo, b.ul, b.last
   0x0000aaaadfed99b0 <+32>:	mov	w0, #0x0                   	// #0
   0x0000aaaadfed99b4 <+36>:	ret
   0x0000aaaadfed99b8 <+40>:	adrp	x1, 0xaaaadff7f000 <tunable_list+1400>
   0x0000aaaadfed99bc <+44>:	ldr	x1, [x1, #3512]
   0x0000aaaadfed99c0 <+48>:	mrs	x2, tpidr_el0
   0x0000aaaadfed99c4 <+52>:	mov	w3, #0xc                   	// #12
   0x0000aaaadfed99c8 <+56>:	mov	w0, #0xffffffff            	// #-1
=> 0x0000aaaadfed99cc <+60>:	str	w3, [x2, x1]
   0x0000aaaadfed99d0 <+64>:	ret
End of assembler dump.

(gdb) frame
#0  0x0000aaaadfed99cc in __brk (addr=<optimized out>) at ../sysdeps/unix/sysv/linux/brk.c:39
39	      __set_errno (ENOMEM);

(gdb) l
34	__brk (void *addr)
35	{
36	  __curbrk = (void *) INTERNAL_SYSCALL_CALL (brk, addr);
37	  if (__curbrk < addr)
38	    {
39	      __set_errno (ENOMEM);
40	      return -1;
41	    }
42
43	  return 0;

(gdb) p $x1
$1 = 64

(gdb) p $x2
$2 = 0
---

Example of strace output when the bug triggers:
---
$ cat trace 
execve("/sbin/ldconfig", ["/sbin/ldconfig", "-p"], 0xfffff4e155a8 /* 34 vars */) = 0
geteuid()                               = 1002
getuid()                                = 1002
getegid()                               = 1002
getgid()                                = 1002
brk(NULL)                               = 0xaaaac182d000
brk(0xaaaac182db98)                     = 0xaaaac182d000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x40} ---
+++ killed by SIGSEGV (core dumped) +++
---

The last brk() result 0xaaaac182d000 is smaller than the brk() argument 0xaaaac182db98, so the glibc __brk() considers that the memory allocation failed.


Versions:
---
$ rpm -q glibc
glibc-2.35.9000-11.fc37.aarch64

$ uname -a
Linux python-builder-fedora-rawhide-aarch64 5.17.0-0.rc8.123.fc37.aarch64 #1 SMP Mon Mar 14 17:54:40 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
---

Comment 1 Victor Stinner 2022-03-21 04:55:12 UTC
Issue tracked in Python as: https://bugs.python.org/issue47078

Comment 2 Victor Stinner 2022-03-21 05:01:21 UTC
Oh, the kernel was updated the same day. Maybe it's a kernel regression:

* old kernel (ok): 5.17.0-0.rc0.20220112gitdaadb3bd0e8d.63.fc36.aarc
* new kernel (bug): 5.17.0-0.rc8.123.fc37.aarch64

Comment 3 Victor Stinner 2022-03-21 05:09:40 UTC
This bug reminds me an old kernel brk issue on AArch64: https://bugzilla.redhat.com/show_bug.cgi?id=1797052

Comment 4 Florian Weimer 2022-03-21 09:48:13 UTC
(In reply to Victor Stinner from comment #2)
> Oh, the kernel was updated the same day. Maybe it's a kernel regression:
> 
> * old kernel (ok): 5.17.0-0.rc0.20220112gitdaadb3bd0e8d.63.fc36.aarc
> * new kernel (bug): 5.17.0-0.rc8.123.fc37.aarch64

This looks like a duplicate of kernel bug 1749633 for aarch64, but perhaps for static PIE binaries only. ASLR setup is tricky, the kernel gets this wrong from time to time.

Comment 5 Victor Stinner 2022-03-21 12:31:49 UTC
With ASLR enabled (/proc/sys/kernel/randomize_va_space = 2), "ldconfig -p" crash after between 300 and 1300 runs.

With ASLR disabled (/proc/sys/kernel/randomize_va_space = 0), I fail to reproduce "ldconfig -p" crash: I stopped my test after 20,000 iterations.

Comment 6 Victor Stinner 2022-03-21 12:50:42 UTC
Created attachment 1867144 [details]
empty.c reproducer

Reproducer: get attached empty.c and run:

$ gcc -std=c11 -static-pie -g empty.c -o empty -O2
$ i=0; while true; do ./empty; rc=$?; i=$(($i + 1)); echo "$i: $(date): $rc"; if [ $rc -ne 0 ]; then break; fi; done
(...)
1: lun. 21 mars 2022 13:48:09 CET: 0
2: lun. 21 mars 2022 13:48:09 CET: 0
3: lun. 21 mars 2022 13:48:09 CET: 0
4: lun. 21 mars 2022 13:48:09 CET: 0
5: lun. 21 mars 2022 13:48:09 CET: 0
6: lun. 21 mars 2022 13:48:09 CET: 0
Erreur de segmentation (core dumped)
7: lun. 21 mars 2022 13:48:09 CET: 139

Comment 7 Victor Stinner 2022-03-21 15:55:23 UTC
Sadly, the final Linux 5.17 release is also affected.

$ uname -r
5.17.0-128.fc37.aarch64
$ i=0; while true; do ./empty; rc=$?; i=$(($i + 1)); echo "$i: $(date): $rc"; if [ $rc -ne 0 ]; then break; fi; done
(...)
252: lun. 21 mars 2022 16:55:11 CET: 0
253: lun. 21 mars 2022 16:55:11 CET: 0
254: lun. 21 mars 2022 16:55:11 CET: 0
Erreur de segmentation (core dumped)
255: lun. 21 mars 2022 16:55:11 CET: 139

Comment 8 Victor Stinner 2022-03-21 16:43:08 UTC
I tested different kernel versions to bisect the issue, it's between builds 63 (2022-01-12 git daadb3bd0e8d) and 83 (5.17rc2):

* ok: 5.17.0-0.rc0.20220112gitdaadb3bd0e8d.63.fc36.aarch64 (last built kernel without the bug)
* BUG: 5.17.0-0.rc2.83.fc36.aarch64 (first built kernel with the bug)
* BUG: 5.17.0-0.rc2.20220202git9f7fb8de5d9b.84.fc36.aarch64

Sadly, all builds between build 63 and build 83 failed.

Just to be sure, I also tested the kernel 5.16.0-60.fc36.aarch64: ok.

Comment 9 Victor Stinner 2022-03-22 00:50:10 UTC
According to git bisect, the bug was introduced by this change:

https://github.com/torvalds/linux/commit/9630f0d60fec5fbcaa4435a66f75df1dc9704b66

commit 9630f0d60fec5fbcaa4435a66f75df1dc9704b66
Author: H.J. Lu <hjl.tools>
Date:   Wed Jan 19 18:09:40 2022 -0800

    fs/binfmt_elf: use PT_LOAD p_align values for static PIE

    Extend commit ce81bb256a22 ("fs/binfmt_elf: use PT_LOAD p_align values
    for suitable start address") which fixed PIE binaries built with
    -Wl,-z,max-page-size=0x200000, to cover static PIE binaries.  This
    fixes:

        https://bugzilla.kernel.org/show_bug.cgi?id=215275

    Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading.

    Link: https://lkml.kernel.org/r/20211209174052.370537-1-hjl.tools@gmail.com
    Signed-off-by: H.J. Lu <hjl.tools>
    Cc: Chris Kennelly <ckennelly>
    Cc: Al Viro <viro.org.uk>
    Cc: Alexey Dobriyan <adobriyan>
    Cc: Song Liu <songliubraving>
    Cc: David Rientjes <rientjes>
    Cc: Ian Rogers <irogers>
    Cc: Hugh Dickins <hughd>
    Cc: Suren Baghdasaryan <surenb>
    Cc: Sandeep Patil <sspatil>
    Cc: Fangrui Song <maskray>
    Cc: Nick Desaulniers <ndesaulniers>
    Cc: Kirill A. Shutemov <kirill.shutemov.com>
    Cc: Mike Kravetz <mike.kravetz>
    Cc: Shuah Khan <shuah>
    Signed-off-by: Andrew Morton <akpm>
    Signed-off-by: Linus Torvalds <torvalds>

 fs/binfmt_elf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comment 10 Victor Stinner 2022-03-22 01:56:32 UTC
This change was following by the following fix, but this fix doesn't impact static-pie programs (these programs have interpreter=NULL):

https://github.com/torvalds/linux/commit/925346c129da1171222a9cdb11fa2b734d9955da

commit 925346c129da1171222a9cdb11fa2b734d9955da
Author: Mike Rapoport <rppt>
Date:   Fri Feb 11 16:32:22 2022 -0800

    fs/binfmt_elf: fix PT_LOAD p_align values for loaders
    
    Rui Salvaterra reported that Aisleroit solitaire crashes with "Wrong
    __data_start/_end pair" assertion from libgc after update to v5.17-rc1.
    
    Bisection pointed to commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD
    p_align values for static PIE") that fixed handling of static PIEs, but
    made the condition that guards load_bias calculation to exclude loader
    binaries.
    
    Restoring the check for presence of interpreter fixes the problem.
    
    Link: https://lkml.kernel.org/r/20220202121433.3697146-1-rppt@kernel.org
    Fixes: 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
    Signed-off-by: Mike Rapoport <rppt.com>
    Reported-by: Rui Salvaterra <rsalvaterra>
    Tested-by: Rui Salvaterra <rsalvaterra>
    Cc: Alexander Viro <viro.org.uk>
    Cc: Eric Biederman <ebiederm>
    Cc: "H.J. Lu" <hjl.tools>
    Cc: Kees Cook <keescook>
    Signed-off-by: Andrew Morton <akpm>
    Signed-off-by: Linus Torvalds <torvalds>

 fs/binfmt_elf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 11 Victor Stinner 2022-03-22 02:10:58 UTC
In the brk() syscall, the following check fails when the bug occurs:
---
	/* Check against existing mmap mappings. */
	next = find_vma(mm, oldbrk);
	if (next && newbrk + PAGE_SIZE > vm_start_gap(next))
		goto out;
---

Comment 12 Victor Stinner 2022-03-22 02:27:13 UTC
I reported the issue to the kernel upstream bug tracker:
https://bugzilla.kernel.org/show_bug.cgi?id=215720

Comment 13 Florian Weimer 2022-04-27 15:14:29 UTC
Apparently the revert made it into v5.18-rc3:

commit 354e923df042a11d1ab8ca06b3ebfab3a018a4ec
Author: Andrew Morton <akpm>
Date:   Thu Apr 14 19:13:55 2022 -0700

    revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
    
    Commit 925346c129da11 ("fs/binfmt_elf: fix PT_LOAD p_align values for
    loaders") was an attempt to fix regressions due to 9630f0d60fec5f
    ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE").

commit aeb7923733d100b86c6bc68e7ae32913b0cec9d8
Author: Andrew Morton <akpm>
Date:   Thu Apr 14 19:13:58 2022 -0700

    revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"

It was Cc:ed to <stable.org>, so hopefully it will make it into a 5.17.z kernel, too.

Comment 14 Justin M. Forbes 2022-04-28 12:27:51 UTC
It has been in since 5.17.4, which was the first 5.17 rebase to stable Fedora releases.

Comment 15 Victor Stinner 2022-04-29 07:45:05 UTC
I confirm that the bug is fixed in the Fedora package kernel-5.18.0-0.rc4.20220427git46cf2c613f4b10e.35.fc37.aarch64. I tested the following shell command
---
i=0; while true; do ldconfig -V >/dev/null; rc=$?; i=$(($i + 1)); echo "$i: $(date): $rc"; if [ $rc -ne 0 ]; then break; fi; done
---

* With old kernel 5.17.0-128.fc37.aarch64: the command crash in less than 1,000 iterations
* With new kernel 5.18.0-0.rc4.20220427git46cf2c613f4b10e.35.fc37.aarch64: there is no crash after 14,000 iterations (I stopped the test)

Since the change introducing the regression has been reverted, can this issue be closed? Or do you want to keep it reopen until the upstream issue is closed?
https://bugzilla.kernel.org/show_bug.cgi?id=215720


Note You need to log in before you can comment on or make changes to this bug.