Bug 2046276 - GDB crashes on 'finish' from inline function
Summary: GDB crashes on 'finish' from inline function
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: gdb
Version: 35
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kevin Buettner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-26 13:57 UTC by Jonathan Wakely
Modified: 2022-11-29 18:55 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-11-29 18:55:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
C++ source to reproduce crash (54.66 KB, text/x-csrc)
2022-01-26 13:57 UTC, Jonathan Wakely
no flags Details
Preprocessed C++ source to reproduce crash (a-fs_ops.ii) (1.97 MB, text/plain)
2022-01-26 13:58 UTC, Jonathan Wakely
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 28856 0 P2 UNCONFIRMED Python pretty printer causes stack overflow when printing frame arguments 2022-02-03 11:47:52 UTC

Description Jonathan Wakely 2022-01-26 13:57:28 UTC
Created attachment 1855514 [details]
C++ source to reproduce crash

Description of problem:

GDB crashes while stepping through a valid program.


Version-Release number of selected component (if applicable):

gdb-11.1-5.fc34.x86_64
gdb-11.1-5.fc35.x86_64

How reproducible:

Always.


Steps to Reproduce:

Using the attached fs_ops.cc file on F35:

$ rpm -q gcc-c++ gdb
gcc-c++-11.2.1-7.fc35.ppc64le
gdb-11.1-5.fc35.ppc64le
$ g++ -g fs_ops.cc   
$ gdb -q -ex start -ex n -ex n -ex step -ex n -ex step -ex finish -ex cont -ex 'print \"finished\"' a.out
Reading symbols from a.out...
Temporary breakpoint 1 at 0x1000f050: file fs_ops.cc, line 2350.
Starting program: /tmp/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Temporary breakpoint 1, main () at fs_ops.cc:2350
2350      std::error_code ec = std::make_error_code(std::errc::invalid_argument);
2351      std::filesystem::path p = "foo/a";
2352      std::filesystem::remove_all(p, ec);
std::filesystem::remove_all (p=filesystem::path "foo/a" = {...}, ec=std::error_code = {"generic": EINVAL}) at fs_ops.cc:2032
2032      ec.clear();
2033      return fs::do_remove_all(p, ErrorReporter{ec});
std::filesystem::(anonymous namespace)::ErrorReporter::ErrorReporter (this=0x7fffffffe880, ec=std::error_code = { }) at fs_ops.cc:1942
1942        ErrorReporter(error_code& ec) : code(&ec)
Run till exit from #0  std::filesystem::(anonymous namespace)::ErrorReporter::ErrorReporter (this=0x7fffffffe880, ec=std::error_code = { })
    at fs_ops.cc:1942
Aborted (core dumped)



Or use the attached a-fs_ops.ii file (for x86_64) and run these commands:


mock -q -r fedora-35-x86_64 --install gdb gcc-c++

mock -q -r fedora-35-x86_64 --dnf-cmd debuginfo-install glibc-2.34-11.fc35.x86_64 libgcc-11.2.1-7.fc35.x86_64 libstdc++-11.2.1-7.fc35.x86_64

# Copy the attached file into the mock root:
cp a-fs_ops.ii /var/lib/mock/fedora-35-x86_64/root/tmp/

mock -q -r fedora-35-x86_64 --chroot "cd /tmp && g++ -g a-fs_ops.ii"

# Test non-interactively:
mock -q -r fedora-35-x86_64 --chroot "cd /tmp && gdb -q -ex start -ex n -ex n -ex step -ex n -ex step -ex finish -ex cont -ex 'print \"finished\"' a.out ; echo $?"

mock -q -r fedora-35-x86_64 --shell

# These commands should be run in the mock shell:
cd /tmp
gdb -q -ex start -ex n -ex n -ex step -ex n -ex step -ex finish a.out




Actual results:

Aborted (core dumped)




Expected results:

GDB 'finish' runs the function to completion


Additional info:

When trying to run the mock commands above entirely non-interactively it *sometimes* prints '0' after the GDB command, but this seems to be a mock bug. The GDB process aborts before running the 'cont' and 'print "finished"' commands:

$ mock -q  -r fedora-35-x86_64 --chroot "cd /tmp && gdb -q -ex start -ex n -ex n -ex step -ex n -ex step -ex finish -ex cont -ex 'print \"finished\"' a.out ; echo $?"
Reading symbols from a.out...
Temporary breakpoint 1 at 0x40cdae: file fs_ops.cc, line 2350.
Starting program: /tmp/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Temporary breakpoint 1, main () at fs_ops.cc:2350
2350    fs_ops.cc: No such file or directory.
2351    in fs_ops.cc
2352    in fs_ops.cc
std::filesystem::remove_all (p=filesystem::path "foo/a" = {...}, ec=std::error_code = {"generic": EINVAL}) at fs_ops.cc:2032
2032    in fs_ops.cc
2033    in fs_ops.cc
std::filesystem::(anonymous namespace)::ErrorReporter::ErrorReporter (this=0x7fffffffea70, ec=std::error_code = { }) at fs_ops.cc:1942
1942    in fs_ops.cc
Run till exit from #0  std::filesystem::(anonymous namespace)::ErrorReporter::ErrorReporter (this=0x7fffffffea70, ec=std::error_code = { }) at fs_ops.cc:1942
0

This *should* be printing 134, the shell exit code for SIGABRT.


When run interactively with mock --shell the abort is always seen:

$ mock -q -r fedora-35-x86_64 --shell
<mock-chroot> sh-5.1# cd /tmp
<mock-chroot> sh-5.1# gdb -q -ex start -ex n -ex n -ex step -ex n -ex step -ex finish a.out
Reading symbols from a.out...
Download failed: No route to host.  Continuing without source file /tmp/fs_ops.cc.
Temporary breakpoint 1 at 0x40cdae: file fs_ops.cc, line 2350.
Starting program: /tmp/a.out 
Download failed: No route to host.  Continuing without debug info for /tmp/system-supplied DSO at 0x7ffff7fc8000.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Temporary breakpoint 1, main () at fs_ops.cc:2350
Download failed: No route to host.  Continuing without source file /tmp/fs_ops.cc.
2350    fs_ops.cc: No such file or directory.
2351    in fs_ops.cc
2352    in fs_ops.cc
std::filesystem::remove_all (p=filesystem::path "foo/a" = {...}, ec=std::error_code = {"generic": EINVAL}) at fs_ops.cc:2032
2032    in fs_ops.cc
2033    in fs_ops.cc
std::filesystem::(anonymous namespace)::ErrorReporter::ErrorReporter (this=0x7fffffffe2f0, ec=std::error_code = { }) at fs_ops.cc:1942
1942    in fs_ops.cc
Run till exit from #0  std::filesystem::(anonymous namespace)::ErrorReporter::ErrorReporter (this=0x7fffffffe2f0, ec=std::error_code = { }) at fs_ops.cc:1942
Aborted (core dumped)
<mock-chroot> sh-5.1# echo $?
134


It's 100% reproducible on a real F35 system without mock anyway.


The same crash happens on F34 but not with the system g++, only when using a self-built gcc 11.2.1 12.0.1 to compile the code.

This suggests maybe some change in GCC's debuginfo between F34's:
gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC) 
and F35's:
gcc version 11.2.1 20211203 (Red Hat 11.2.1-7) (GCC) 

Either way, GDB should not abort.

Comment 1 Jonathan Wakely 2022-01-26 13:58:35 UTC
Created attachment 1855515 [details]
Preprocessed C++ source to reproduce crash (a-fs_ops.ii)

Comment 2 Jonathan Wakely 2022-01-26 14:18:32 UTC
I can't reproduce it on rawhide because of:

../../gdb/objfiles.h:510: internal-error: sect_index_data not initialized
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

But I think a fix for that is on the way.

Comment 3 Jonathan Wakely 2022-01-28 23:14:28 UTC
Running "up" also crashes it. I think it's something to do with printing the context of the previous stack frame.

Comment 4 Jonathan Wakely 2022-01-29 01:01:40 UTC
All I get when running gdb under gdb is:


"gdb" received signal SIGSEGV, Segmentation fault.
                                                        0x00005555557a98d2 in scoped_debug_start_end::scoped_debug_start_end (this=0x7fffff7ff0c0, debug_enabled=@0x55555626ffc0: false, module=0x555555e59f87 "frame", func=0x555555e0510b "frame_unwind_find_by_frame", start_prefix=0x555555ec09c9 "enter", end_prefix=0x555555e4279c "exit", fmt=0x0) at ../../gdb/../gdbsupport/common-debug.h:108
108       scoped_debug_start_end (bool &debug_enabled, const char *module,

Comment 5 Jonathan Wakely 2022-01-29 01:04:01 UTC
looks like a stack overflow, I get tens of thousands of frames like this:

#0  0x00005555557a98d2 in scoped_debug_start_end::scoped_debug_start_end (this=0x7fffff7ff0d0, debug_enabled=@0x55555626ffc0: false, module=0x555555e59f87 "frame", 
    func=0x555555e0510b "frame_unwind_find_by_frame", start_prefix=0x555555ec09c9 "enter", end_prefix=0x555555e4279c "exit", fmt=0x0)
    at ../../gdb/../gdbsupport/common-debug.h:108
#1  0x00005555558d9012 in frame_unwind_find_by_frame (this_frame=this_frame@entry=0x555557b78010, this_cache=this_cache@entry=0x555557b78028) at ../../gdb/frame-unwind.c:181
#2  0x00005555558da0a9 in frame_unwind_arch (next_frame=0x555557b78010) at ../../gdb/frame.c:2863
#3  0x00005555558da6b0 in get_frame_arch (this_frame=this_frame@entry=0x555557b78010) at ../../gdb/frame.c:2852
#4  0x00005555558d9044 in frame_unwind_find_by_frame (this_frame=this_frame@entry=0x555557b78010, this_cache=this_cache@entry=0x555557b78028) at ../../gdb/frame-unwind.c:184
#5  0x00005555558da0a9 in frame_unwind_arch (next_frame=0x555557b78010) at ../../gdb/frame.c:2863
#6  0x00005555558da6b0 in get_frame_arch (this_frame=this_frame@entry=0x555557b78010) at ../../gdb/frame.c:2852
#7  0x00005555558d9044 in frame_unwind_find_by_frame (this_frame=this_frame@entry=0x555557b78010, this_cache=this_cache@entry=0x555557b78028) at ../../gdb/frame-unwind.c:184
#8  0x00005555558da0a9 in frame_unwind_arch (next_frame=0x555557b78010) at ../../gdb/frame.c:2863

Comment 6 Jonathan Wakely 2022-01-29 01:06:22 UTC
If I compile my code with -fno-omit-frame-pointer gdb crashes sooner and differently (which is probably a separate bug):

Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
0x00005555558eab1a in gdbarch_num_regs (gdbarch=0x7000600070007) at ../../gdb/gdbarch.c:2120
2120      gdb_assert (gdbarch->num_regs != -1);
(gdb) bt
#0  0x00005555558eab1a in gdbarch_num_regs (gdbarch=0x7000600070007) at ../../gdb/gdbarch.c:2120
#1  0x0000555555a75938 in reg_buffer::num_raw_registers (this=0x5555572153f0) at ../../gdb/regcache.c:225
#2  readable_regcache::cooked_read (this=0x5555572153f0, regnum=16, buf=0x5555571379c0 "") at ../../gdb/regcache.c:692
#3  0x000055555584fd5d in dummy_frame_prev_register (this_frame=<optimized out>, this_prologue_cache=<optimized out>, regnum=16) at ../../gdb/dummy-frame.c:356
#4  0x00005555558da8a1 in frame_unwind_register_value (next_frame=0x5555564364c0, regnum=16) at ../../gdb/frame.c:1233
#5  0x00005555558dacd3 in frame_register_unwind (next_frame=next_frame@entry=0x5555564364c0, regnum=regnum@entry=16, optimizedp=optimizedp@entry=0x7fffffffcd20, 
    unavailablep=unavailablep@entry=0x7fffffffcd24, lvalp=lvalp@entry=0x7fffffffcd2c, addrp=addrp@entry=0x7fffffffcd30, realnump=0x7fffffffcd28, 
    bufferp=0x7fffffffcd50 "\340AcVUU") at ../../gdb/frame.c:1143
#6  0x00005555558db11f in frame_unwind_register (next_frame=next_frame@entry=0x5555564364c0, regnum=16, buf=buf@entry=0x7fffffffcd50 "\340AcVUU") at ../../gdb/frame.c:1199
#7  0x0000555555927a48 in i386_unwind_pc (gdbarch=0x5555566341e0, next_frame=0x5555564364c0) at ../../gdb/i386-tdep.c:1970
#8  0x00005555558da108 in frame_unwind_pc (this_frame=0x5555564364c0) at ../../gdb/frame.c:948
#9  0x00005555558da1ee in get_frame_pc_if_available (frame=frame@entry=0x55555782ac90, pc=pc@entry=0x7fffffffcee0) at ../../gdb/frame.c:2549
#10 0x0000555555b16981 in print_frame_info (fp_opts=..., frame=0x55555782ac90, print_level=<optimized out>, print_what=SRC_AND_LOC, print_args=<optimized out>, 
    set_current_sal=1) at ../../gdb/stack.c:1188
#11 0x0000555555b17355 in print_stack_frame (frame=0x55555782ac90, print_level=1, print_what=SRC_AND_LOC, set_current_sal=1) at ../../gdb/stack.c:366
#12 0x0000555555b17407 in print_stack_frame_to_uiout (uiout=0x555556537650, frame=0x55555782ac90, print_level=1, print_what=SRC_AND_LOC, set_current_sal=1)
    at ../../gdb/stack.c:345
#13 0x0000555555b97dff in tui_on_user_selected_context_changed (selection=...) at ../../gdb/tui/tui-interp.c:231
#14 0x0000555555b1425e in std::function<void (enum_flags<user_selected_what_flag>)>::operator()(enum_flags<user_selected_what_flag>) const (__args#0=..., this=0x555556499c60)
    at /usr/include/c++/11/bits/std_function.h:560
#15 gdb::observers::observable<enum_flags<user_selected_what_flag> >::notify (args#0=..., this=<optimized out>) at ../../gdb/../gdbsupport/observable.h:150
#16 up_command (count_exp=<optimized out>, from_tty=<optimized out>) at ../../gdb/stack.c:2693
#17 0x00005555557e6f2a in cmd_func (cmd=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at ../../gdb/cli/cli-decode.c:2160
#18 0x0000555555b7c747 in execute_command (p=<optimized out>, p@entry=<error reading variable: value has been optimized out>, from_tty=1, 
    from_tty@entry=<error reading variable: value has been optimized out>) at ../../gdb/top.c:674
#19 0x000055555599fff3 in catch_command_errors (command=<optimized out>, arg=<optimized out>, from_tty=<optimized out>, do_bp_actions=<optimized out>) at ../../gdb/main.c:523
#20 0x00005555559a00c2 in execute_cmdargs (cmdarg_vec=cmdarg_vec@entry=0x7fffffffd4a0, file_type=file_type@entry=CMDARG_FILE, cmd_type=cmd_type@entry=CMDARG_COMMAND, 
    ret=ret@entry=0x7fffffffd49c) at ../../gdb/main.c:618
#21 0x00005555559a1a36 in captured_main_1 (context=<optimized out>) at ../../gdb/main.c:1322
#22 0x00005555559a25cf in captured_main (data=<optimized out>) at ../../gdb/main.c:1343
#23 gdb_main (args=<optimized out>) at ../../gdb/main.c:1368
#24 0x00005555556ec9b0 in main (argc=<optimized out>, argv=<optimized out>) at ../../gdb/gdb.c:40

Comment 7 Jonathan Wakely 2022-02-02 23:11:11 UTC
The crash is caused by a libstdc++ pretty printer. Probably the one for std::error_code.

I don't know how that causes GDB to go into a loop though.

Comment 8 Jonathan Wakely 2022-02-02 23:50:42 UTC
It's this part of the std::error_code printer:

    @staticmethod
    def _category_name(cat):
        "Call the virtual function that overrides std::error_category::name()"
        gdb.set_convenience_variable('__cat', cat)
        return gdb.parse_and_eval('$__cat->name()').string()

If I replace that with just return "" then GDB shows:

Run till exit from #0  std::filesystem::(anonymous namespace)::ErrorReporter::ErrorReporter (this=0x7fffffffd720, ec=std::error_code = {"": 0}) at fs_ops.cc:1942
std::filesystem::remove_all (p=filesystem::path "foo/a"<error reading variable: Cannot access memory at address 0x430eb000>, ec=std::error_code = {"": 0}) at fs_ops.cc:2033
2033      return fs::do_remove_all(p, ErrorReporter{ec});

With the printer it shows:

Run till exit from #0  std::filesystem::(anonymous namespace)::ErrorReporter::ErrorReporter (this=0x7fffffffd720, ec=std::error_code = { }) at fs_ops.cc:1942
Aborted (core dumped)

Comment 9 Jonathan Wakely 2022-02-03 10:14:11 UTC
The crash seems to happen when trying to print the details of a stack frame, which happens when running 'up' or 'finish' or 'bt'.

That I only saw it with inline functions was a red herring, what matters is that it has a std::error_code& parameter.

I think the value of the std::error_code& parameter is garbage while entering or leaving the function, so when the printer tries to call a virtual function it goes off into the weeds.

This reproduces it:

#include <system_error>

int f(std::error_code& ec)
{
  ec.assign(1, std::system_category());
  return ec.value();
}

int g(std::error_code& ec)
{
  return f(ec);
}

int main()
{
  std::error_code ec;
  return g(ec);
}


g++ ec.C -g
gdb -q -ex start -ex n -ex step -ex step -ex n -ex up -ex up a.out

Reading symbols from a.out...
Temporary breakpoint 1 at 0x40117b: file ec.C, line 16.
Starting program: /tmp/a.out 

Temporary breakpoint 1, main () at ec.C:16
16        std::error_code ec;
17        return g(ec);
g (ec=std::error_code = { }) at ec.C:11
11        return f(ec);
f (ec=std::error_code = { }) at ec.C:5
5         ec.assign(1, std::system_category());
6         return ec.value();
#1  0x0000000000401171 in g (ec=std::error_code = {"system": EPERM}) at ec.C:11
11        return f(ec);
Aborted (core dumped)

Comment 10 Jonathan Wakely 2022-02-03 11:47:53 UTC
Reported upstream with a minimal reproducer that doesn't depend on libstdc++ printers.
https://sourceware.org/bugzilla/show_bug.cgi?id=28856

Comment 11 Ben Cotton 2022-11-29 17:44:51 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 12 Jonathan Wakely 2022-11-29 18:55:09 UTC
Works in F36


Note You need to log in before you can comment on or make changes to this bug.