Bug 1817106

Summary: glibc: ld.so appears to segfault when failing to load very large PT_LOAD segment.
Product: Red Hat Enterprise Linux 8 Reporter: Jeff Bastian <jbastian>
Component: glibcAssignee: glibc team <glibc-bugzilla>
Status: CLOSED CANTFIX QA Contact: qe-baseos-tools-bugs
Severity: low Docs Contact:
Priority: low    
Version: 8.2CC: ashankar, codonell, dj, efuller, fweimer, jhladky, mnewsome, pfrankli, sipoyare
Target Milestone: rc   
Target Release: 8.3   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-02 13:41:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
valgrind log file none

Description Jeff Bastian 2020-03-25 15:19:23 UTC
Description of problem:
ELF binaries with very large LOAD sections (larger than system RAM) crash with a segmentation fault in the loader.

For example, the NAS Parallel Benchmark 'ft' has an 85 GB LOAD section when compiled as a "class D" benchmark (*):


~]# readelf -l bin/ft.D.x | grep -A1 -e Type -e LOAD
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
--
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000005df8 0x0000000000005df8  R E    0x200000
  LOAD           0x0000000000006dc0 0x0000000000606dc0 0x0000000000606dc0
                 0x00000000000003cc 0x0000001402a18a80  RW     0x200000
                                    ^^^^^^^^^^^^^^^^^^
                                          |
                                          +-- = ~85 GB

~]# bin/ft.D.x 
Segmentation fault

~]# strace bin/ft.D.x
execve("bin/ft.D.x", ["bin/ft.D.x"], 0x7ffd9fb520a0 /* 29 vars */) = -1 ENOMEM (Cannot allocate memory)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)

]# strace -f /lib64/ld-linux-x86-64.so.2 bin/ft.D.x 
execve("/lib64/ld-linux-x86-64.so.2", ["/lib64/ld-linux-x86-64.so.2", "bin/ft.D.x"], 0x7ffe74a79a50 /* 29 vars */) = 0
brk(NULL)                               = 0x555556921000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffc99c255b0) = -1 EINVAL (Invalid argument)
openat(AT_FDCWD, "bin/ft.D.x", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\260\n@\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=40128, ...}) = 0
getcwd("/root/NPB3.3.1/NPB3.3-OMP", 128) = 26
mmap(0x400000, 24576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x400000
mmap(0x606000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x606000
mmap(0x608000, 85943482432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
close(3)                                = 0
writev(2, [{iov_base="bin/ft.D.x", iov_len=10}, {iov_base=": ", iov_len=2}, {iov_base="error while loading shared libra"..., iov_len=36}, {iov_base=": ", iov_len=2}, {iov_base="bin/ft.D.x", iov_len=10}, {iov_base=": ", iov_len=2}, {iov_base="cannot map zero-fill pages", iov_len=26}, {iov_base="", iov_len=0}, {iov_base="", iov_len=0}, {iov_base="\n", iov_len=1}], 10bin/ft.D.x: error while loading shared libraries: bin/ft.D.x: cannot map zero-fill pages
) = 89
exit_group(127)                         = ?
+++ exited with 127 +++


Version-Release number of selected component (if applicable):
glibc-2.28-101.el8.x86_64
gcc-gfortran-8.3.1-5.el8.x86_64

How reproducible:
always

Steps to Reproduce:
0. Find a system with less than 85 GB RAM
1. yum -y install gcc-gfortran openmpi openmpi-devel
2. wget https://www.nas.nasa.gov/assets/npb/NPB3.3.1.tar.gz
3. tar xf NPB3.3.1.tar.gz
4. cd NPB3.3.1/NPB3.3-OMP
5. cp config/make.def.template config/make.def
6. sed -i -e 's/f77/gfortran/' \
       -e '/^[CF]FLAGS/ s/$/ -mcmodel=medium/' config/make.def
7. make ft CLASS=D
8. bin/ft.D.x

Actual results:
Segmentation fault from trying to load the binary

Expected results:
An "out of memory" error message or some other indicator of a problem besides a seg fault

Additional info:
(*) NPB classes: https://www.nas.nasa.gov/publications/npb_problem_sizes.html

Comment 1 Jiri Hladky 2020-03-25 15:34:15 UTC
Output when using valgrind (run on https://beaker.engineering.redhat.com/view/gold-1s.tpb.lab.eng.brq.redhat.com#details with 48 GiB RAM)

$ valgrind --log-file=ft.D.x.valgrind --tool=memcheck --leak-check=yes -v --leak-check=full --show-reachable=yes NPB_sources/bin/ft.D.x 

 NAS Parallel Benchmarks (NPB3.3-OMP) - FT Benchmark

Size                : 2048x1024x1024
Iterations                  :     25
Number of available threads :     24

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
Segmentation fault (core dumped)

I'm attaching valgrind log file as well. 

gdb on valgrind.core
$ gdb NPB_sources/bin/ft.D.x ft.D.x.valgrind.core.4724
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000005ce0c4d in bigarrays_ () from /lib64/libpthread.so.0
[Current thread is 1 (Thread 0x8ace700 (LWP 4728))]

Comment 2 Jiri Hladky 2020-03-25 15:38:50 UTC
Created attachment 1673559 [details]
valgrind log file

$ valgrind --log-file=ft.D.x.valgrind --tool=memcheck --leak-check=yes -v --leak-check=full --show-reachable=yes NPB_sources/bin/ft.D.x 

Server: gold-1s.tpb.lab.eng.brq.redhat.com

kernel 4.18.0-187.el8.x86_64
glibc-2.28-101.el8.x86_64
libgomp-8.3.1-5.el8.x86_64

Comment 3 Jeff Bastian 2020-03-25 15:53:22 UTC
See also sibling bug 1817111 about ldd misbehaving on this binary.

Comment 4 Carlos O'Donell 2020-03-28 03:00:40 UTC
Jeff,

Thanks for submitting this issue. We should not segfault here, we should gracefully exit.

~~~
writev(2, [{iov_base="bin/ft.D.x", iov_len=10}, {iov_base=": ", iov_len=2}, {iov_base="error while loading shared libra"..., iov_len=36}, {iov_base=": ", iov_len=2}, {iov_base="bin/ft.D.x", iov_len=10}, {iov_base=": ", iov_len=2}, {iov_base="cannot map zero-fill pages", iov_len=26}, {iov_base="", iov_len=0}, {iov_base="", iov_len=0}, {iov_base="\n", iov_len=1}], 10bin/ft.D.x: error while loading shared libraries: bin/ft.D.x: cannot map zero-fill pages
) = 89
exit_group(127)                         = ?
+++ exited with 127 +++
~~~

We'll look into this.

Is this blocking a customer issue?

Comment 5 Jiri Hladky 2020-03-28 09:03:06 UTC
Hi Carlos,

no, this is not blocking any customer issue. 

Thanks
Jirka

Comment 9 Florian Weimer 2020-04-01 08:12:51 UTC
Would you please clarify why you think that the SIGSEGV is generated by the glibc dynamic loader?

C reproducer:

char large_data[128 * 1024LL * 1024 * 1024];

int
main (void)
{
}

(Adjust the array size if necessary.)

We can build a pseudo-dynamic-linker from nortld.S:

	.text
	.globl _start
_start:
	ud2

Like this:

gcc -shared -nostdlib -nostartfiles -o nortld.so nortld.S

This will crash with SIGILL when executed. We can link the C reproducer against that:

gcc -Wl,--dynamic-linker=./nortld.so reproducer.c

It still crashes with SIGSEGV, not SIGILL:

./a.out 
Segmentation fault

This suggests to me that the SIGSEGV is synthesized by the kernel, and the dynamic loader never starts running.

Comment 11 Jeff Bastian 2020-04-01 14:03:25 UTC
My understanding of how a binary gets loaded and starts running is a bit fuzzy, so it may not be the dynamic loader that's crashing, but I thought that was one of the first steps.  If I'm wrong -- which is very likely given your example in comment 9 -- feel free to change the BZ component and $subject.

Comment 12 Jiri Hladky 2020-04-01 14:10:58 UTC
Please check also the valgrind output in comment #1 and #2. 


Perhaps the problem is in libpthread?

gdb on valgrind.core
$ gdb NPB_sources/bin/ft.D.x ft.D.x.valgrind.core.4724
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000005ce0c4d in bigarrays_ () from /lib64/libpthread.so.0
[Current thread is 1 (Thread 0x8ace700 (LWP 4728))]

Comment 13 Florian Weimer 2020-04-01 15:44:15 UTC
(In reply to Jiri Hladky from comment #12)
> Please check also the valgrind output in comment #1 and #2. 

I think this could be a different issue.

> Perhaps the problem is in libpthread?
> 
> gdb on valgrind.core
> $ gdb NPB_sources/bin/ft.D.x ft.D.x.valgrind.core.4724
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0000000005ce0c4d in bigarrays_ () from /lib64/libpthread.so.0
> [Current thread is 1 (Thread 0x8ace700 (LWP 4728))]

There is no bigarrays_ function in libpthread, so this looks rather iffy.

Comment 14 Jiri Hladky 2020-04-01 16:03:02 UTC
I think we will need your help to identify the right component. 

valgrind shows the following (the full valgrind output is attached)[1]. Is it of any help? 

Florian, could you please advise what is the correct component or how should we find it out? 

[1] 
==4724== ERROR SUMMARY: 25 errors from 3 contexts (suppressed: 0 from 0)
==4724== 
==4724== 1 errors in context 1 of 3:
==4724== Invalid write of size 8
==4724==    at 0x401EA8: init_ui_._omp_fn.7 (ft.f:193)
==4724==    by 0x564E6A5: GOMP_parallel (parallel.c:171)
==4724==    by 0x40529C: init_ui_ (ft.f:190)
==4724==    by 0x400EB2: ft (ft.f:107)
==4724==    by 0x400EB2: main (ft.f:167)
==4724==  Address 0x801610450 is not stack'd, malloc'd or (recently) free'd
==4724== 
==4724== 
==4724== 23 errors in context 2 of 3:
==4724== Thread 12:
==4724== Invalid write of size 8
==4724==    at 0x401E98: init_ui_._omp_fn.7 (ft.f:192)
==4724==    by 0x565843D: gomp_thread_start (team.c:123)
==4724==    by 0x5CD62DD: start_thread (pthread_create.c:486)
==4724==    by 0x5FE9E82: clone (clone.S:95)
==4724==  Address 0x3b2d74420 is not stack'd, malloc'd or (recently) free'd
==4724== 
==4724== ERROR SUMMARY: 25 errors from 3 contexts (suppressed: 0 from 0)

Comment 15 Jeff Bastian 2020-04-01 18:03:08 UTC
Some more info from Keith Seitz in gdb bug 1819001 comment 1:

A data point:

$ /usr/bin/gdb -q -ex r --args /lib64/ld-linux-x86-64.so.2 bin/ft.D.x
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/usr/lib64/ld-2.28.so.debug...done.
done.
Starting program: /usr/lib64/ld-linux-x86-64.so.2 bin/ft.D.x
bin/ft.D.x: error while loading shared libraries: bin/ft.D.x: cannot map zero-fill pages
[Inferior 1 (process 7427) exited with code 0177]

If we attempt to catch mmap system call, we discover (or at least I do!) that we're exiting with
the syscall exit_group. Catching that shows us where the problem is:

(gdb) catch syscall exit_group
Catchpoint 1 (syscall 'exit_group' [231])
(gdb) r
Starting program: /usr/lib64/ld-linux-x86-64.so.2 bin/ft.D.x
bin/ft.D.x: error while loading shared libraries: bin/ft.D.x: cannot map zero-fill pages

Catchpoint 1 (call to syscall exit_group), __GI__exit (status=status@entry=127)
    at ../sysdeps/unix/sysv/linux/_exit.c:31
31	      INLINE_SYSCALL (exit_group, 1, status);
(gdb) bt
#0  __GI__exit (status=status@entry=127)
    at ../sysdeps/unix/sysv/linux/_exit.c:31
#1  0x00007ffff7deee57 in fatal_error (errcode=<optimized out>, 
    objname=<optimized out>, occasion=<optimized out>, 
    errstring=0x7ffff7df5b00 "cannot map zero-fill pages")
    at dl-error-skeleton.c:78
#2  0x00007ffff7deef1a in _dl_signal_error (errcode=errcode@entry=0, 
    objname=objname@entry=0x7fffffffda91 "bin/ft.D.x", 
    occation=occation@entry=0x0, 
    errstring=errstring@entry=0x7ffff7df5b00 "cannot map zero-fill pages")
    at dl-error-skeleton.c:124
#3  0x00007ffff7dda257 in lose (code=code@entry=0, fd=fd@entry=3, 
    name=name@entry=0x7fffffffda91 "bin/ft.D.x", 
    realname=realname@entry=0x7ffff7ffe150 "bin/ft.D.x", 
    l=l@entry=0x7ffff7ffe160, msg=0x7ffff7df5b00 "cannot map zero-fill pages", 
    r=0x7ffff7ffe120 <_r_debug>, nsid=0) at dl-load.c:851
#4  0x00007ffff7ddabb2 in _dl_map_object_from_fd (
    name=name@entry=0x7fffffffda91 "bin/ft.D.x", origname=origname@entry=0x0, 
    fd=<optimized out>, fbp=fbp@entry=0x7fffffffd090, 
    realname=<optimized out>, loader=loader@entry=0x0, l_type=<optimized out>, 
    mode=<optimized out>, stack_endp=<optimized out>, nsid=<optimized out>)
    at dl-load.c:888
#5  0x00007ffff7ddd34a in _dl_map_object (loader=loader@entry=0x0, 
    name=0x7fffffffda91 "bin/ft.D.x", type=type@entry=0, 
    trace_mode=trace_mode@entry=0, mode=mode@entry=536870912, 
    nsid=nsid@entry=0) at dl-load.c:2251
#6  0x00007ffff7dd882c in dl_main (phdr=<optimized out>, phnum=8, 
    user_entry=0x7fffffffd658, auxv=<optimized out>) at rtld.c:1061
#7  0x00007ffff7dee11f in _dl_sysdep_start (
    start_argptr=start_argptr@entry=0x7fffffffd730, 
    dl_main=dl_main@entry=0x7ffff7dd6560 <dl_main>) at ../elf/dl-sysdep.c:253
#8  0x00007ffff7dd6118 in _dl_start_final (arg=0x7fffffffd730) at rtld.c:413
#9  _dl_start (arg=0x7fffffffd730) at rtld.c:520
#10 0x00007ffff7dd5058 in _start ()

Comment 16 Carlos O'Donell 2020-04-01 18:32:05 UTC
(In reply to Jeff Bastian from comment #15)
> Some more info from Keith Seitz in gdb bug 1819001 comment 1:
> 
> A data point:
> 
> $ /usr/bin/gdb -q -ex r --args /lib64/ld-linux-x86-64.so.2 bin/ft.D.x
> Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from
> /usr/lib/debug/usr/lib64/ld-2.28.so.debug...done.
> done.
> Starting program: /usr/lib64/ld-linux-x86-64.so.2 bin/ft.D.x
> bin/ft.D.x: error while loading shared libraries: bin/ft.D.x: cannot map
> zero-fill pages
> [Inferior 1 (process 7427) exited with code 0177]

This is not a sigsegv. The loader exited correctly and gave you an appropriate error message. The exit code was 127 e.g. command not found (not quite accurate).

The question is: What is happening during the SIGSEGV cases?

Florian's comment here:
https://bugzilla.redhat.com/show_bug.cgi?id=1817106#c9

Indicates that we never get to userspace (the illegal instruction is never run) and so it may be in the kernel's binfmt_elf support where it tries to map the object in and delivers the SIGSEGV because of the large PT_LOAD segment. Therefore there is no way for us to recover from that.

Comment 17 Florian Weimer 2020-04-02 09:10:18 UTC
(In reply to Jiri Hladky from comment #14)
> I think we will need your help to identify the right component. 
> 
> valgrind shows the following (the full valgrind output is attached)[1]. Is
> it of any help? 

I think all the valgrind issues reported here so far are different bugs (if they are bugs at all). The kernel does not actually run any userspace code in this case, so there can't be anything for valgrind to report.

> Florian, could you please advise what is the correct component or how should
> we find it out? 

I filed kernel bug 1820095 for the confusing segfault (mentioned in the summary of this bug). Beyond that, it's not clear to me what else we can do. With an explicit loader invocation, glibc already prints are fairly accurate error message (“cannot map zero-fill pages”).

And as I said, the valgrind issues discussed here are something else and do not really point to glibc problems either (the one trace which mentions libpthread seems to have hit some missing/incorrect debuginfo).

I suggest we close this bug as CANTFIX. If you want to track down the valgrind issues, you should valgrind bugs. Sorry.

Comment 18 Jiri Hladky 2020-04-02 09:32:33 UTC
Thank you, Florian! 

> I suggest we close this bug as CANTFIX. If you want to track down the valgrind issues, you should valgrind bugs. Sorry.
OK, I understand. 

@Jeff - I will let the final decision on you. 

Jirka

Comment 19 Jeff Bastian 2020-04-02 13:41:32 UTC
I expected this might be a CANTFIX bug since it's a rather odd corner case.  But I appreciate everyone taking time to dig into this, and I learned a few new debugging tricks in the process.