Bug 1819001

Summary:	gdb cannot debug elf binary with large PT_LOAD segment
Product:	Red Hat Enterprise Linux 8	Reporter:	Jeff Bastian <jbastian>
Component:	gdb	Assignee:	Keith Seitz <keiths>
gdb sub component:	system-version	QA Contact:	qe-baseos-tools-bugs
Status:	CLOSED UPSTREAM	Docs Contact:
Severity:	low
Priority:	low	CC:	codonell, dsmith, efuller, gdb-bugs, jhladky, keiths, ohudlick
Version:	8.2	Keywords:	Triaged
Target Milestone:	rc
Target Release:	8.3
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-04-15 19:01:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jeff Bastian 2020-03-30 22:20:49 UTC

Description of problem:
ELF binaries with large (larger than RAM + swap) PT_LOAD segments are difficult to debug with gdb since they crash early in startup.

For example, the NAS Parallel Benchmark 'ft' has an 85 GB PT_LOAD section when compiled as a "class D" benchmark (*):


~]# readelf -l bin/ft.D.x | grep -A1 -e Type -e LOAD
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
--
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000005df8 0x0000000000005df8  R E    0x200000
  LOAD           0x0000000000006dc0 0x0000000000606dc0 0x0000000000606dc0
                 0x00000000000003cc 0x0000001402a18a80  RW     0x200000
                                    ^^^^^^^^^^^^^^^^^^
                                          |
                                          +-- = ~85 GB

~]# bin/ft.D.x 
Segmentation fault

~]# gdb -q bin/ft.D.x
Reading symbols from bin/ft.D.x...done.
(gdb) b main
Breakpoint 1 at 0x4051b5: file ft.f, line 167.
(gdb) run
Starting program: /root/NPB3.3.1/NPB3.3-OMP/bin/ft.D.x 
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
No stack.
(gdb) set startup-with-shell off
(gdb) run
Starting program: /root/NPB3.3.1/NPB3.3-OMP/bin/ft.D.x 
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
No stack.
(gdb) info locals
No frame selected.
(gdb) frame
No stack.


strace on the binary gave a few more clues which led to bug 1817106 about ld.so crashing with a seg fault. See also bug 1817111 (now a dup of bug 1085549) about ldd failing on this same binary.

~]# strace bin/ft.D.x
execve("bin/ft.D.x", ["bin/ft.D.x"], 0x7ffd9fb520a0 /* 29 vars */) = -1 ENOMEM (Cannot allocate memory)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)

~]# strace -f /lib64/ld-linux-x86-64.so.2 bin/ft.D.x 
execve("/lib64/ld-linux-x86-64.so.2", ["/lib64/ld-linux-x86-64.so.2", "bin/ft.D.x"], 0x7ffe74a79a50 /* 29 vars */) = 0
brk(NULL)                               = 0x555556921000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffc99c255b0) = -1 EINVAL (Invalid argument)
openat(AT_FDCWD, "bin/ft.D.x", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\260\n@\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=40128, ...}) = 0
getcwd("/root/NPB3.3.1/NPB3.3-OMP", 128) = 26
mmap(0x400000, 24576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x400000
mmap(0x606000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x606000
mmap(0x608000, 85943482432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
close(3)                                = 0
writev(2, [{iov_base="bin/ft.D.x", iov_len=10}, {iov_base=": ", iov_len=2}, {iov_base="error while loading shared libra"..., iov_len=36}, {iov_base=": ", iov_len=2}, {iov_base="bin/ft.D.x", iov_len=10}, {iov_base=": ", iov_len=2}, {iov_base="cannot map zero-fill pages", iov_len=26}, {iov_base="", iov_len=0}, {iov_base="", iov_len=0}, {iov_base="\n", iov_len=1}], 10bin/ft.D.x: error while loading shared libraries: bin/ft.D.x: cannot map zero-fill pages
) = 89
exit_group(127)                         = ?
+++ exited with 127 +++




Version-Release number of selected component (if applicable):
gdb-8.2-11.el8.x86_64
gcc-gfortran-8.3.1-5.el8.x86_64


How reproducible:
always

Steps to Reproduce:
0. Find a system with less than 85 GB RAM
1. yum -y install gcc-gfortran openmpi openmpi-devel
2. wget https://www.nas.nasa.gov/assets/npb/NPB3.3.1.tar.gz
3. tar xf NPB3.3.1.tar.gz
4. cd NPB3.3.1/NPB3.3-OMP
5. cp config/make.def.template config/make.def
6. sed -i -e 's/f77/gfortran/' \
          -e 's/^FFLAGS.*/FFLAGS = -O0 -g -mcmodel=medium/' \
          -e 's/^CFLAGS.*/CFLAGS = -O0 -g -mcmodel=medium/' \
       config/make.def
7. make ft CLASS=D
8. gdb bin/ft.D.x
9. b main
10. run
11. bt
12. set startup-with-shell off
13. run
14. bt


Actual results:
gdb does not help to explain why the binary is crashing so early

Expected results:
gdb gives some helpful clues?

Additional info:
(*) NPB classes: https://www.nas.nasa.gov/publications/npb_problem_sizes.html

Comment 1 Keith Seitz 2020-04-01 17:44:50 UTC

(In reply to Jeff Bastian from comment #0)
> Description of problem:

*Thank you* for the excellent repoducer!

> Actual results:
> gdb does not help to explain why the binary is crashing so early
> 
> Expected results:
> gdb gives some helpful clues?

A data point:

$ /usr/bin/gdb -q -ex r --args /lib64/ld-linux-x86-64.so.2 bin/ft.D.x
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/usr/lib64/ld-2.28.so.debug...done.
done.
Starting program: /usr/lib64/ld-linux-x86-64.so.2 bin/ft.D.x
bin/ft.D.x: error while loading shared libraries: bin/ft.D.x: cannot map zero-fill pages
[Inferior 1 (process 7427) exited with code 0177]

If we attempt to catch mmap system call, we discover (or at least I do!) that we're exiting with
the syscall exit_group. Catching that shows us where the problem is:

(gdb) catch syscall exit_group
Catchpoint 1 (syscall 'exit_group' [231])
(gdb) r
Starting program: /usr/lib64/ld-linux-x86-64.so.2 bin/ft.D.x
bin/ft.D.x: error while loading shared libraries: bin/ft.D.x: cannot map zero-fill pages

Catchpoint 1 (call to syscall exit_group), __GI__exit (status=status@entry=127)
    at ../sysdeps/unix/sysv/linux/_exit.c:31
31	      INLINE_SYSCALL (exit_group, 1, status);
(gdb) bt
#0  __GI__exit (status=status@entry=127)
    at ../sysdeps/unix/sysv/linux/_exit.c:31
#1  0x00007ffff7deee57 in fatal_error (errcode=<optimized out>, 
    objname=<optimized out>, occasion=<optimized out>, 
    errstring=0x7ffff7df5b00 "cannot map zero-fill pages")
    at dl-error-skeleton.c:78
#2  0x00007ffff7deef1a in _dl_signal_error (errcode=errcode@entry=0, 
    objname=objname@entry=0x7fffffffda91 "bin/ft.D.x", 
    occation=occation@entry=0x0, 
    errstring=errstring@entry=0x7ffff7df5b00 "cannot map zero-fill pages")
    at dl-error-skeleton.c:124
#3  0x00007ffff7dda257 in lose (code=code@entry=0, fd=fd@entry=3, 
    name=name@entry=0x7fffffffda91 "bin/ft.D.x", 
    realname=realname@entry=0x7ffff7ffe150 "bin/ft.D.x", 
    l=l@entry=0x7ffff7ffe160, msg=0x7ffff7df5b00 "cannot map zero-fill pages", 
    r=0x7ffff7ffe120 <_r_debug>, nsid=0) at dl-load.c:851
#4  0x00007ffff7ddabb2 in _dl_map_object_from_fd (
    name=name@entry=0x7fffffffda91 "bin/ft.D.x", origname=origname@entry=0x0, 
    fd=<optimized out>, fbp=fbp@entry=0x7fffffffd090, 
    realname=<optimized out>, loader=loader@entry=0x0, l_type=<optimized out>, 
    mode=<optimized out>, stack_endp=<optimized out>, nsid=<optimized out>)
    at dl-load.c:888
#5  0x00007ffff7ddd34a in _dl_map_object (loader=loader@entry=0x0, 
    name=0x7fffffffda91 "bin/ft.D.x", type=type@entry=0, 
    trace_mode=trace_mode@entry=0, mode=mode@entry=536870912, 
    nsid=nsid@entry=0) at dl-load.c:2251
#6  0x00007ffff7dd882c in dl_main (phdr=<optimized out>, phnum=8, 
    user_entry=0x7fffffffd658, auxv=<optimized out>) at rtld.c:1061
#7  0x00007ffff7dee11f in _dl_sysdep_start (
    start_argptr=start_argptr@entry=0x7fffffffd730, 
    dl_main=dl_main@entry=0x7ffff7dd6560 <dl_main>) at ../elf/dl-sysdep.c:253
#8  0x00007ffff7dd6118 in _dl_start_final (arg=0x7fffffffd730) at rtld.c:413
#9  _dl_start (arg=0x7fffffffd730) at rtld.c:520
#10 0x00007ffff7dd5058 in _start ()

I don't think there is much gdb can do about this...

Comment 2 Jeff Bastian 2020-04-01 18:06:10 UTC

I suspected there wasn't much gdb could do, but I actually learned gdb was more capable than I realized.  How did I never know gdb could catch syscalls?  Thank you for teaching me a new trick!  This is useful info for bug 1817106.

Comment 3 Carlos O'Donell 2020-04-01 18:39:01 UTC

(In reply to Keith Seitz from comment #1)
> $ /usr/bin/gdb -q -ex r --args /lib64/ld-linux-x86-64.so.2 bin/ft.D.x

When you run under the loader directly the loader is responsible for the mappings (not the kernel) and so we get a graceful exit.

> I don't think there is much gdb can do about this...

What happens when you debug the process directly?

Comment 4 Carlos O'Donell 2020-04-01 18:40:14 UTC

(In reply to Carlos O'Donell from comment #3)
> (In reply to Keith Seitz from comment #1)
> > $ /usr/bin/gdb -q -ex r --args /lib64/ld-linux-x86-64.so.2 bin/ft.D.x
> 
> When you run under the loader directly the loader is responsible for the
> mappings (not the kernel) and so we get a graceful exit.
> 
> > I don't think there is much gdb can do about this...
> 
> What happens when you debug the process directly?

To be clear, the supposition right now is that the kernel is artificially delivering the SIGSEGV at exec time.

Comment 5 Carlos O'Donell 2020-04-01 18:41:01 UTC

(In reply to Carlos O'Donell from comment #4)
> (In reply to Carlos O'Donell from comment #3)
> > (In reply to Keith Seitz from comment #1)
> > > $ /usr/bin/gdb -q -ex r --args /lib64/ld-linux-x86-64.so.2 bin/ft.D.x
> > 
> > When you run under the loader directly the loader is responsible for the
> > mappings (not the kernel) and so we get a graceful exit.
> > 
> > > I don't think there is much gdb can do about this...
> > 
> > What happens when you debug the process directly?
> 
> To be clear, the supposition right now is that the kernel is artificially
> delivering the SIGSEGV at exec time.

And if so, how do you improve this use case?

Comment 6 Keith Seitz 2020-04-01 20:40:54 UTC

(In reply to Carlos O'Donell from comment #3)
> (In reply to Keith Seitz from comment #1)
> > $ /usr/bin/gdb -q -ex r --args /lib64/ld-linux-x86-64.so.2 bin/ft.D.x
> 
> When you run under the loader directly the loader is responsible for the
> mappings (not the kernel) and so we get a graceful exit.
> 
> > I don't think there is much gdb can do about this...
> 
> What happens when you debug the process directly?

Normally, we (as in "gdb") should be able to stop in the startup code,
but not in this case:

(gdb) b *_start
Breakpoint 1 at 0x400af0
(gdb) r
Starting program: /home/rhel8/rhbz/1819001/NPB3.3.1/NPB3.3-OMP/bin/ft.D.x 
During startup program terminated with signal SIGSEGV, Segmentation fault.

That's a pretty serious indicator that something is amiss.

If we enable some debugging:

$ /usr/bin/gdb -q bin/ft.D.x
Reading symbols from bin/ft.D.x...done.
(gdb) set startup-with-shell 0
(gdb) set debug lin-lwp 1
(gdb) r
Starting program: /home/rhel8/rhbz/1819001/NPB3.3.1/NPB3.3-OMP/bin/ft.D.x 
sigchld
linux_nat_wait: [process 14314], []
LLW: enter
LNW: waitpid(-1, ...) returned 14314, ERRNO-OK
LLW: waitpid 14314 received Segmentation fault (stopped)
LNW: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: NOT resuming LWP process 14314, has pending status
LLW: exit
LLR: Preparing to resume process 14314, Segmentation fault, inferior_ptid process 14314
LLR: PTRACE_CONT process 14314, Segmentation fault (resume event thread)
linux_nat_wait: [process 14314], []
RSRL: NOT resuming LWP process 14314, not stopped
LLW: enter
LNW: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: NOT resuming LWP process 14314, not stopped
linux-nat: about to sigsuspend
sigchld
LNW: waitpid(-1, ...) returned 14314, ERRNO-OK
LLW: waitpid 14314 received Segmentation fault (terminated)
LWP 14314 exited (resumed=1)
LNW: waitpid(-1, ...) returned -1, No child processes
RSRL: NOT resuming LWP process 14314, has pending status
LLW: exit
During startup program terminated with signal SIGSEGV, Segmentation fault.

The very first call to waitpid shows the process has been terminated. The
returned wstatus is 0xb7f. That really doesn't tell us much other than the
process stopped with a segmentation fault, as reported.

Unless there is more info to be gleaned from somewhere, I am not entirely
sure what we can do other than make the error message even more verbose.

dmesg reports a little bit more about the underlying problem:

[Wed Apr  1 15:06:29 2020] ft.D.x[14038]: segfault at 7ffff4a7353b ip 00007ffff>
[Wed Apr  1 15:06:29 2020] Code: Bad RIP value.

Is that information particularly more useful, though, even to the above average
user?

Comment 7 Carlos O'Donell 2020-04-01 21:33:05 UTC

(In reply to Keith Seitz from comment #6)
> During startup program terminated with signal SIGSEGV, Segmentation fault.

This isn't quite true because "startup" never happened.

The message that would be more accurate is:

"The operating system kernel has terminated the process *before* startup with signal SIGSEGV, segmentation fault."

Could we reliably print something like that?

Comment 11 Keith Seitz 2021-04-15 19:01:39 UTC

Pedro, David, and I have discussed, and we are not certain there is anything that can be done to improve the user experience here in the short-term. Moving upstream.