Bug 2074993 - gdb: Single-stepping in ld.so does not work
Summary: gdb: Single-stepping in ld.so does not work
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: gdb
Version: 35
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kevin Buettner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-13 11:43 UTC by Florian Weimer
Modified: 2022-10-20 02:46 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-10-20 02:46:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Florian Weimer 2022-04-13 11:43:05 UTC
Consider this session transcript:

$ gdb --args /usr/bin/python -c 'import ctypes; ctypes.CDLL("X")'
[…]
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Reading symbols from /home/fweimer/.cache/debuginfod_client/b2db3eef93d0dc9ab831d9cdbbd379f12f1d19e7/debuginfo...
(gdb) break dlopen
Function "dlopen" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dlopen) pending.
(gdb) c
The program is not being run.
(gdb) r
Starting program: /usr/bin/python -c import\ ctypes\;\ ctypes.CDLL\(\"X\"\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, ___dlopen (file=file@entry=0x7ffff77cd0f0 "/usr/lib64/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so", mode=2) at dlopen.c:77
77	{
(gdb) break dl_open_worker_begin
Breakpoint 2 at 0x7ffff7fd3780: file dl-open.c, line 489.
(gdb) cont
Continuing.

Breakpoint 2, dl_open_worker_begin (a=a@entry=0x7fffffffb370) at dl-open.c:489
489	{
(gdb) next
491	  const char *file = args->file;
(gdb) next
492	  int mode = args->mode;
(gdb) next
499	  const char *dst = strchr (file, '$');
(gdb) next
500	  if (dst != NULL || args->nsid == __LM_ID_CALLER
(gdb) next
507	      call_map = GL(dl_ns)[LM_ID_BASE]._ns_loaded;
(gdb) next
509	      struct link_map *l = _dl_find_dso_for_object ((ElfW(Addr)) caller_dlopen);
(gdb) next
511	      if (l)
(gdb) next
514	      if (args->nsid == __LM_ID_CALLER)
(gdb) next
515		args->nsid = call_map->l_ns;
(gdb) next
521	  args->libc_already_loaded = GL(dl_ns)[args->nsid].libc_map != NULL;
(gdb) next
524	  args->original_global_scope_pending_adds
(gdb) next
529	  _dl_debug_initialize (0, args->nsid);
(gdb) next
533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
(gdb) step
__GI___libc_malloc (bytes=74) at malloc.c:3175
3175	{
(gdb) 

The jump into malloc is absolutely unexpected, execution should stop at _dl_map_object. _dl_map_object is in a different translation unit and we do not use LTO, so there shouldn't be a problem with stopping at this function.

Seen with:

gdb-11.2-2.fc35.x86_64
glibc-2.34-29.fc35.x86_64
python3-3.10.4-1.fc35.x86_64

Comment 1 Kevin Buettner 2022-04-21 16:10:33 UTC
Agreed.  Upon receiving the step, execution should stop in _dl_map_object.

A backtrace shows _dl_map_object on the stack:

(gdb) bt 4
#0  __GI___libc_malloc (bytes=74) at malloc.c:3175
#1  0x00007ffff7fea2d0 in malloc (size=74) at ../include/rtld-malloc.h:56
#2  __strdup (
    s=0x7fffea2edb70 "/usr/lib64/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so") at strdup.c:42
#3  0x00007ffff7fcfb88 in _dl_map_object (loader=loader@entry=0x7ffff7fa0000, 
    name=name@entry=0x7fffea2edb70 "/usr/lib64/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so", type=type@entry=2, 
    trace_mode=trace_mode@entry=0, mode=mode@entry=-1879048190, 
    nsid=<optimized out>) at dl-load.c:2276
(More stack frames follow...)

Also, doing 'si' repeatedly in place of the 'step' shows that execution does enter _dl_map_object:

529	  _dl_debug_initialize (0, args->nsid);
1: x/i $pc
=> 0x7ffff7fd3807 <dl_open_worker_begin+135>:	call   0x7ffff7fca380 <_dl_debug_initialize>
(gdb) next
533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
1: x/i $pc
=> 0x7ffff7fd380c <dl_open_worker_begin+140>:	mov    0x20(%rbx),%r9
(gdb) si
0x00007ffff7fd3810	533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
1: x/i $pc
=> 0x7ffff7fd3810 <dl_open_worker_begin+144>:	mov    %r12d,%r8d
(gdb) si
0x00007ffff7fd3813	533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
1: x/i $pc
=> 0x7ffff7fd3813 <dl_open_worker_begin+147>:	xor    %ecx,%ecx
(gdb) 
0x00007ffff7fd3815	533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
1: x/i $pc
=> 0x7ffff7fd3815 <dl_open_worker_begin+149>:	or     $0x10000000,%r8d
(gdb) 
0x00007ffff7fd381c	533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
1: x/i $pc
=> 0x7ffff7fd381c <dl_open_worker_begin+156>:	mov    $0x2,%edx
(gdb) 
0x00007ffff7fd3821	533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
1: x/i $pc
=> 0x7ffff7fd3821 <dl_open_worker_begin+161>:	mov    %r14,%rsi
(gdb) 
0x00007ffff7fd3824	533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
1: x/i $pc
=> 0x7ffff7fd3824 <dl_open_worker_begin+164>:	mov    %rbp,%rdi
(gdb) 
0x00007ffff7fd3827	533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
1: x/i $pc
=> 0x7ffff7fd3827 <dl_open_worker_begin+167>:	call   0x7ffff7fcfa10 <_dl_map_object>
(gdb) 
_dl_map_object (loader=loader@entry=0x7ffff7fa0000, name=name@entry=0x7fffea2edb70 "/usr/lib64/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so", type=type@entry=2, trace_mode=trace_mode@entry=0, mode=mode@entry=-1879048190, nsid=0) at dl-load.c:2038
2038	{
1: x/i $pc
=> 0x7ffff7fcfa10 <_dl_map_object>:	endbr64 
(gdb) 
2046	  assert (nsid >= 0);
1: x/i $pc
=> 0x7ffff7fcfa14 <_dl_map_object+4>:	push   %r15
(gdb) 
0x00007ffff7fcfa16	2046	  assert (nsid >= 0);
1: x/i $pc
=> 0x7ffff7fcfa16 <_dl_map_object+6>:	push   %r14
(gdb) 
0x00007ffff7fcfa18	2046	  assert (nsid >= 0);
1: x/i $pc
=> 0x7ffff7fcfa18 <_dl_map_object+8>:	push   %r13
(gdb) 
0x00007ffff7fcfa1a	2046	  assert (nsid >= 0);
1: x/i $pc
=> 0x7ffff7fcfa1a <_dl_map_object+10>:	push   %r12
(gdb) 
0x00007ffff7fcfa1c	2046	  assert (nsid >= 0);
1: x/i $pc
=> 0x7ffff7fcfa1c <_dl_map_object+12>:	push   %rbp
(gdb) 
0x00007ffff7fcfa1d	2046	  assert (nsid >= 0);
1: x/i $pc
=> 0x7ffff7fcfa1d <_dl_map_object+13>:	push   %rbx
(gdb) 
0x00007ffff7fcfa1e	2046	  assert (nsid >= 0);
1: x/i $pc
=> 0x7ffff7fcfa1e <_dl_map_object+14>:	sub    $0x3b8,%rsp
(gdb)

Comment 2 Kevin Buettner 2022-04-21 16:26:10 UTC
I've reproduced this problem in a GDB build of current upstream master.

It also exists in F36 and rawhide (F37).

Comment 3 Kevin Buettner 2022-04-27 23:32:28 UTC
It turns out that this is expected and, most times, desirable behavior from GDB.

When stepping info a function for which the PLT hasn't been resolved yet, most users will want to see GDB step into the function in question, not into either the PLT or the dynamic linker's resolution code.

I have an experimental patch in my local tree which can be used to change this behavior.  Here's an example session:

[kev@f35-1 gdb]$ ./gdb -q --args /usr/bin/python -c 'import ctypes; ctypes.CDLL("X")'
Reading symbols from /usr/bin/python...

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Reading symbols from /home/kev/.cache/debuginfod_client/b2db3eef93d0dc9ab831d9cdbbd379f12f1d19e7/debuginfo...
(gdb) help set solib-debug-resolver 
Set debugging of the dynamic linker's resolver.
If "on", stepping into the dynamic linker's resolver and PLT code is
enabled.  The default is "off" which causes such code to be skipped
during debugging.

(gdb) show solib-debug-resolver
Debugging of the dynamic linker's resolver is off.
(gdb) break dlopen
Function "dlopen" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dlopen) pending.
(gdb) run
Starting program: /usr/bin/python -c import\ ctypes\;\ ctypes.CDLL\(\"X\"\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, ___dlopen (
    file=file@entry=0x7fffea2edcc0 "/usr/lib64/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so", mode=2)
    at dlopen.c:77
77	{
(gdb) break dl_open_worker_begin
Breakpoint 2 at 0x7ffff7fd8260: file dl-open.c, line 489.
(gdb) c
Continuing.

Breakpoint 2, dl_open_worker_begin (a=a@entry=0x7fffffffaee0) at dl-open.c:489
489	{
(gdb) next
491	  const char *file = args->file;
(gdb) next
492	  int mode = args->mode;
(gdb) next
499	  const char *dst = strchr (file, '$');
(gdb) next
500	  if (dst != NULL || args->nsid == __LM_ID_CALLER
(gdb) next
507	      call_map = GL(dl_ns)[LM_ID_BASE]._ns_loaded;
(gdb) next
509	      struct link_map *l = _dl_find_dso_for_object ((ElfW(Addr)) caller_dlopen);
(gdb) next
511	      if (l)
(gdb) next
514	      if (args->nsid == __LM_ID_CALLER)
(gdb) next
515		args->nsid = call_map->l_ns;
(gdb) next
quit
521	  args->libc_already_loaded = GL(dl_ns)[args->nsid].libc_map != NULL;
(gdb) next
quit
524	  args->original_global_scope_pending_adds
(gdb) next
quit
529	  _dl_debug_initialize (0, args->nsid);
(gdb) next
quit
533	  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
(gdb) set solib-debug-resolver on
(gdb) s
_dl_map_object (loader=loader@entry=0x7ffff7fbf110, 
    name=name@entry=0x7fffea2edcc0 "/usr/lib64/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so", 
    type=type@entry=2, trace_mode=trace_mode@entry=0, mode=mode@entry=-1879048190, nsid=0) at dl-load.c:1999
1999	{
(gdb) step
2007	  assert (nsid >= 0);
(gdb) next
2008	  assert (nsid < GL(dl_nns));
(gdb) next
2011	  for (l = GL(dl_ns)[nsid]._ns_loaded; l; l = l->l_next)

Do you think that a patch which allows stepping into the dynamic linker / PLT is useful enough to submit for upstream consideration?

Any quibbles with the name of this new setting (solib-debug-resolver) or its help text?

Comment 4 Florian Weimer 2022-05-02 06:36:14 UTC
(In reply to Kevin Buettner from comment #3)
> It turns out that this is expected and, most times, desirable behavior from
> GDB.
> 
> When stepping info a function for which the PLT hasn't been resolved yet,
> most users will want to see GDB step into the function in question, not into
> either the PLT or the dynamic linker's resolution code.

This is a non-PLT call, though.

But I see that all the dynamic linker code is ignored for single-stepping, not just the trampoline and fixup function:

1637	int
1638	svr4_in_dynsym_resolve_code (CORE_ADDR pc)
1639	{
1640	  struct svr4_info *info = get_svr4_info (current_program_space);
1641	
1642	  return ((pc >= info->interp_text_sect_low
1643		   && pc < info->interp_text_sect_high)
1644		  || (pc >= info->interp_plt_sect_low
1645		      && pc < info->interp_plt_sect_high)
1646		  || in_plt_section (pc)
1647		  || in_gnu_ifunc_stub (pc));
1648	}

Would it be possible to call in_solib_dynsym_resolve_code *before* single-stepping and if it is already true, disable the special behavior? The argument being that if you have already stopped in the dynamic loader, it is likely that you want to debug it, too.

Comment 5 Kevin Buettner 2022-10-20 02:46:42 UTC
This has been fixed in upstream GDB.

See:

https://sourceware.org/pipermail/gdb-patches/2022-October/192477.html

I'm closing this bug.  The fix will appear in Fedora GDB after a future rebase.


Note You need to log in before you can comment on or make changes to this bug.