Description of problem: When you try to tab complete in fish, it segfaults with SIGABRT and then gives you the options Version-Release number of selected component (if applicable): fish-3.1.2-1.fc33.x86_64 How reproducible: Everytime Steps to Reproduce: In fish shell, try any command with tab completion: For ex: $ refish: Process 128402, “apropos” from job 1, “apropos $argv 2>/dev/null | awk…” terminated by signal SIGABRT (Abort) bfish: Process 128458, “apropos” from job 1, “apropos $argv 2>/dev/null | awk…” terminated by signal SIGABRT (Abort) oot reboot After typing 're' I used tab which threw me the error and then I added 'b' which threw the same error and then given me the command that I am looking for. Additional info: I think this is a problem with man-db, please re-assign if thats the case. Thanks.
Stack trace of thread 105614: #0 0x00007fc505738bc5 raise (libc.so.6 + 0x3dbc5) #1 0x00007fc5057218a4 abort (libc.so.6 + 0x268a4) #2 0x00007fc50577b127 __libc_message (libc.so.6 + 0x80127) #3 0x00007fc505782e1c malloc_printerr (libc.so.6 + 0x87e1c) #4 0x00007fc5057841fc _int_free (libc.so.6 + 0x891fc) #5 0x0000564995c0c940 rpl_regfree (whatis.man-db + 0x1a940) #6 0x0000564995bf786f main (whatis.man-db + 0x586f) #7 0x00007fc5057231a2 __libc_start_main (libc.so.6 + 0x281a2) #8 0x0000564995bf7cbe _start (whatis.man-db + 0x5cbe)
*** Bug 1872458 has been marked as a duplicate of this bug. ***
FYI: Bug is gone after rebuilding package without LTO (once again).
Right, it looks like another linker bug. libman.so seems to be linked to whatis binary both statically and dynamically, if that makes any sense :)
I've disabled LTO (in man-db-2.9.2-6.fc34) for now. However, I'm not sure if the problem is really LTO related, or if it's an unrelated linker bug. Here is what I'm seeing. With LTO enabled, rpl_regfree symbol is present in both libman library and whatis binary: $ nm -D /usr/lib64/man-db/libman-2.9.2.so | grep rpl_regfree 000000000002c081 T rpl_regfree $ nm -D /usr/bin/whatis.man-db | grep rpl_regfree 000000000001aa65 T rpl_regfree This subsequently causes the crash, as the symbol from the binary is preferred. With LTO disabled, the symbol is undefined in whatis binary, as expected: $ nm -D /usr/lib64/man-db/libman-2.9.2.so | grep rpl_regfree 000000000002d140 T rpl_regfree $ nm -D /usr/bin/whatis.man-db | grep rpl_regfree U rpl_regfree Jeff, could you take a look at this? Thanks.
@Nikola, thanks, at least it works fine without LTO for now. I believe it's okay for temporary solution.
Thanks @Nikola, I had to build a disabled LTO version locally and has been using it, now I can update to your latest version.
So a bit more context for c#5: What's happening is we're calling free with a pointer that didn't come from malloc, naturally glibc catches this and triggers a fatal error. This is happening from free_dfa_content, which you're picking up from gl/libgnu.a, which I'm guessing is a copy of gnulib. The relevant lines: 618 if (dfa->sb_char != utf8_sb_map) 619 re_free (dfa->sb_char); So we check that sb_char != utf8_sb_map, and if it isn't, then we call re_free which is defined to just "free". That's quite sensible if you look at the code where this stuff is initialized/allocated. dfa->sb_char will either be utf8_sb_map which is local readonly data or dfa->sb_char will point into the heap. If we look at things under the debugger we have: (gdb) p dfa->sb_char $5 = (re_bitset_ptr_t) 0x7ffff7fb49c0 <utf8_sb_map> Huh?!? it's utf8_sb_map, but we're still passing it to free. That must mean we have two instances of utf8_sb_map. Sure enough we do. One is in the main executable, the other is in a DSO. Here's the one from the main executable: 000000000001d8c0 r utf8_sb_map And the one from libman.so: 00000000000309c0 r utf8_sb_map If we break on the actual comparison within rpl_regfree we have: 1: x/i $pc => 0x55555556e006 <rpl_regfree+454>: cmp %rax,%rdi (gdb) p/x $rax $7 = 0x5555555718c0 (gdb) p/x $rdi $8 = 0x7ffff7fb49c0 dfa->sb_char is pointing to the one from the DSO while the test wants to compare it to the one defined in the main executable. Naturally with the mismatch everything blows up. As c#5 mentions, we're inside rpl_regfree in the main executable. This is what I would expect given that we link against gl/lib/libgnu.a and there's a reference to rpl_regfree in the .o's for the main executable. So we'll get a copy of rpl_regfree in the main executable. Note that dfa->sb_char is initialized by init_dfa. There is no reference to init_dfa in the main executable, so that gets satisfied by the DSO. And in that context we used the DSO's copy of utf8_sb_map when initializing dfa->sb_char (the DSO also links against libgnu.a and thus gets its own copies). I'm still digging around, but wanted to get these findings recorded as I need to step away.
*** Bug 1878386 has been marked as a duplicate of this bug. ***
(In reply to Nikola Forró from comment #5) > I've disabled LTO (in man-db-2.9.2-6.fc34) for now. However, I'm not sure if > the problem is really LTO related, or if it's an unrelated linker bug. This problem is still present on F33; would you mind disabling LTO there also until this is resolved?
FEDORA-2020-d94d71109a has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-d94d71109a
So following-up to my c#8. I'm discussing the situation with upstream GCC. One could argue this is poor composability design on the part of gnulib and man-db as we're creating a DSO which staticly links against gnulib as well as a main executable that statically links against gnulib and yet there's internals within gnulib that require there be one and only one copy of utf8_sb_map. One could also argue that the problem is LTO. LTO will discard unused functions. So rather than pulling in the entire TU from the static library, it's just pulling in the components that are strictly needed in the DSO, then again a different set into the main executable. If we pulled in the entire TU, then all the symbols from the TU would be resolved by the main program and we'd never call the copy in the DSO and everything would just work. Of course that defeats one of the significant benefits of LTO. Finally one could argue that the static data object should have been marked as STB_GNU_UNIQUE when making the main executable and the DSO which would have forced the runtime linker to choose a single version for all uses. This is what we do for static data members in inlined functions and template instantiations for C++. But AFAICT nobody's ever really looked at this issue in the C world. While we sort this out upstream, I would recommend disabling LTO on man-db. I'm going to do a bit of digging to see if there's other instances of this issue floating around. jeff
FEDORA-2020-d94d71109a has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-d94d71109a` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-d94d71109a See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
Thanks guys for working on this and the details exchange on this BZ!
The last update on testing repo have solved the problems for me (man-db 2.9.2-6.fc33)
FEDORA-2020-d94d71109a has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
Just a follow-up here on the underlying LTO issue. I finally got an uninterrupted day to pull together everything I knew about this problem and construct a testcase for upstream GCC. It raises some really interesting semantic issues with LTO that we need to make some decisions around in GCC. For F33 the right thing to do is leave the opt-out in place for man-db. man-db is flagged by my tester as opting out of LTO which will trigger a re-evaluation by the GCC team (along with ~200 other packages) as we continue to fix/analyze various issues in the LTO space. Thanks for your patience.
Thank you for working on this, Jeff.
Final conclusion is this is/was a bug in the linker. The binutils/ld fix has been backported into rawhide and f33, though I'm not sure the latter update is actually in the buildroots yet. I've re-enabled LTO for man-db in rawhide. I don't think it makes sense to issue a man-db update for f33.