Bug 1201897

Summary: SIGSEGV in libedit call in installer
Product: [Fedora] Fedora Reporter: David Shea <dshea>
Component: libeditAssignee: Boris Ranto <branto>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: ajax, awilliam, branto, jeff, joachim.backes, jreznik, jsilhan, lnie, loganjerry, robatino, satellitgo, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedBlocker
Fixed In Version: libedit-3.1-12.20150325cvs.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-30 07:02:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1043125    
Attachments:
Description Flags
backtrace none

Description David Shea 2015-03-13 18:07:51 UTC
Description of problem:
SIGSEGV in terminal_bind_arrow, called via a hawkey query

Maybe this is hawkey's fault, I can't tell. The py3 in one of the source paths looks mighty suspicious since this is being run from a python2 application but I don't really know.

Version-Release number of selected component (if applicable):
libedit-3.1-9.20141030cvs.fc22.x86_64.rpm, hawkey-0.5.3-2.fc23.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Boot a rawhide boot.iso, continue past welcome screen, crashes once packaging gets going.

Additional info:

Attaching core file and some gdb output.

Comment 1 David Shea 2015-03-13 18:14:38 UTC
Created attachment 1001470 [details]
backtrace

Comment 2 David Shea 2015-03-13 19:02:30 UTC
Dang it, missed that the core file was rejected by bugzilla, on account of hugeness. Here it is: https://dshea.fedorapeople.org/1393.core.gz

Comment 3 Boris Ranto 2015-03-17 15:22:25 UTC
Hi David,

I can't currently access the core file (hitting 302 Forbidden). Could you change the permissions of the core file so that the file is world readable?

Comment 4 David Shea 2015-03-17 15:32:08 UTC
Oops. Fixed.

Comment 5 David Shea 2015-03-20 20:23:40 UTC
*** Bug 1204294 has been marked as a duplicate of this bug. ***

Comment 6 Adam Williamson 2015-03-22 18:04:26 UTC
I'm seeing this with F22 Beta TC4 - at least when I do a server netinst in openQA or a virt-manager VM, the screen goes black shortly after the hub comes up, and on ctrl-alt-f1 I see a traceback running through libpython, libhawkey, libedit and libpthread.

Two other testers are reporting something that sounds 99% likely to be the same bug (black screen shortly after reaching install hub), so proposing this as a Beta blocker as it seems to commonly violate:

"When using a dedicated installer image, the installer must be able to complete an installation using the text, graphical and VNC installation interfaces. "

https://fedoraproject.org/wiki/Fedora_22_Alpha_Release_Criteria#Installation_interfaces

Comment 8 Boris Ranto 2015-03-23 10:54:38 UTC
This is an incredibly weird issue. Hawkey is not linked against libedit, it does not #include it and libedit does not even export the map_init function as part of its API. A bit of source code reading uncovers that the map_init function is supposed to be included and run from libsolv. This suggests that the symbol resolution have gone wrong for some reason. It also explains why the app got SIGBUS (misaligned memory access) at the place it did.

As I'm still not sure what caused the symbol resolution problems, I can't say for sure what the solution is but one of the suspects here is the new glibc 5.0 version which requires app rebuild in some cases (the new libedit package rebuilt for fc23 should already be available, maybe that will help).

Comment 9 Boris Ranto 2015-03-23 12:51:17 UTC
Ah, sorry for the typo, my comment was supposed to refer to the new gcc 5.0 version, not the glibc.

Comment 10 David Shea 2015-03-23 14:04:26 UTC
*** Bug 1204507 has been marked as a duplicate of this bug. ***

Comment 11 Adam Williamson 2015-03-23 19:43:39 UTC
Discussed at 2015-03-23 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2015-03-23/f22-blocker-review.2015-03-23-16.02.log.txt . Accepted as a blocker per criterion cited in #c6.

Boris, if you think rebuilding libedit (and anything else?) might help, can you do so for F22? Thanks!

Comment 12 Boris Ranto 2015-03-23 21:06:40 UTC
@awilliam: I've already done that few days back for libedit (Mar 18th). I do not have commit rights for other repos. I've checked and it is indeed compiled against gcc-5.0. It is the libedit-3.1-10.20141030cvs.fc22 version. I did not test it just yet though.

If that does not help then I'm starting to run out of ideas of what could have caused the libedit call. I've checked the hawkey sources and compiled libraries for the presence of libedit bits but there are none. It would be nice if we could get hawkey and probably even libsolv maintainer(s) involved to take a look then.

Comment 13 Honza Silhan 2015-03-24 13:30:50 UTC
As Boris investigated the bug is there probably due to symbol collision. AFAIK hawkey does not have any libedit dependency.

# repoquery --tree-requires hawkey --releasever=22 --disablerepo=* --enablerepo=fedora | grep libedit
# <nothing>

Libsolv's map_init is called from hawkey too. The newest release of hawkey in f22 is built in gcc-5.0.0-0.15.fc22.x86_64, libsolv and libedit in gcc-5.0.0-0.17.fc22.x86_64. I will do the hawkey rebuild.

If it didn't help, please attach small single-threaded reproducer of hawkey/dnf crash.

Comment 14 Adam Williamson 2015-03-24 17:14:54 UTC
well, no joy...I tried building a boot.iso with:

libedit-3.1-10.20141030cvs.fc22
librepo-1.7.13-1.fc22
libsolv-0.6.8-3.fc22
hawkey-0.5.3-2.1.fc22 (that's 0.5.3-2, rebuilt with newer gcc)

and anaconda crashes identically. I'm not capable of producing a small single-threaded reproducer, I'm afraid, I'm just the test monkey...

Comment 15 David Shea 2015-03-25 13:32:00 UTC
*** Bug 1205071 has been marked as a duplicate of this bug. ***

Comment 16 Honza Silhan 2015-03-25 16:12:21 UTC
Boris, would be possible to define "protected" as static? AFAIK these functions are not between includes in devel package. Or make at least map_init static, please.

Functions in libsolv are wrongly without any prefix too but exported in devel and used by other projects. Renaming these would be more difficult.

Comment 17 Boris Ranto 2015-03-25 17:51:37 UTC
Jan, the protected map_init function is shared between the separate libedit source files so I can't define it as static. I might be able to rename it (probably with all the other protected map_* functions) but only as a temporary workaround until the real issue is fixed -- why is the function even being called? Maybe, we should involve gcc people here?

Comment 18 David Shea 2015-03-25 18:01:46 UTC
The problem is that both libsolv (via dnf and hawkey) and libedit (via the readline module maybe? We're not 100% sure) are being loaded into the same program. That detail is not likely to change. The only means that C has of differentiating between functions is the function name, and these two functions have the same name, and it just happened to work out from the other of things being loaded and called that in this case the wrong function was chosen for this name. It's kind of a crappy name.

The gcc visibility attributes might be able to make this function usable across modules but stay local to the shared object.

Comment 19 Adam Williamson 2015-03-25 19:02:04 UTC
So, we have a fairly solid theory here now - it involves llvm-libs .

llvm-libs has a file /usr/lib64/llvm/readline.so , and a /etc/ld.so.conf.d/llvm-x86_64.conf which reads:

/usr/lib64/llvm

Between TC3 and TC4, we restored the file /etc/ld.so.conf to the anaconda environment, which causes those /etc/ld.so.conf.d/*.conf files to actually be loaded. We did that to fix https://bugzilla.redhat.com/show_bug.cgi?id=1204031 .

So in TC4, /usr/lib64/llvm/readline.so will be in the linker cache; prior to TC3 it was not. It seems pretty reasonable to suspect that suspiciously-named lib is gumming up the works here, and indeed davidshea says if he takes away /etc/ld.so.conf.d/llvm-x86_64.conf and re-runs ldconfig, the crash stops happening.

Comment 20 Adam Williamson 2015-03-25 19:11:03 UTC
ajax, you're the llvm maintainer - any thoughts on why it has this readline.so lib at all? it seems like anaconda isn't doing anything wrong here, and this all seems to be caused by some unfortunate naming in other libs.

Comment 21 David Shea 2015-03-25 19:32:29 UTC
The readline.so thing may have been a red herring. Moving readline.so out of the way but keeping $libdir/llvm configured still crashes. I have no clue how any of this works anymore.

Comment 22 Adam Williamson 2015-03-25 19:40:38 UTC
Turns out readline.so isn't the issue, but *something* in llvm seems to be involved:

<davidshea> if I remove the llvm config file from ld.so.conf.d and rerun ldconfig, no crash
<davidshea> hm. maybe it's not that readline.so that's the problem. if I remove that particular file, but leave the rest of them, and rerun ldconfig, it still crashes

Also possibly of use in triaging whatever the crap's going on here: the bug affects only non-live GUI installs. It doesn't affect non-live TUI installs or live installs.

Comment 23 Adam Williamson 2015-03-25 20:07:44 UTC
So I think what's going on is the stuff discussed from #c13 to #c18 is the real bug here. llvm's involvement is simply that it's one of very few things in the anaconda env that are linked against libedit. The only other thing I can find is libxatracker.so.2 , which probably isn't actually loaded with anaconda.

So when the llvm libs aren't in the linker cache, libedit never actually gets loaded with anaconda, and the collision between hawkey and libedit doesn't happen.

When the llvm libs *are* in the linker cache, libedit gets loaded with anaconda (probably through the graphics layers; mesa-dri-drivers depends on llvm-libs), and the collision happens.

Comment 24 Boris Ranto 2015-03-25 20:23:07 UTC
I was looking at the contents of /etc/lib64/llvm and actually all the libraries there are linked against libedit so that explains why removing the llvm ld.so.conf.d fixes the issue.

I will try to use some gcc magic to not export all the protected symbols outside the libedit.so shared library to fix this.

Comment 25 Boris Ranto 2015-03-25 20:50:01 UTC
I've performed a scratch build [1] where I set the visibility of all the libedit protected function symbols to hidden (i.e. they are not placed in dynamic symbol table). The nm -D libedit.so* command does indeed show no map_* symbols. Could anyone please test the build and let me know if it helps so that I can push the patch to the fedora dist-git?

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=9325045

Comment 26 Adam Williamson 2015-03-25 22:34:06 UTC
will do. ajax also suggested a possible fix, with a scratch build of libsolv:

http://koji.fedoraproject.org/koji/taskinfo?taskID=9325563

what he changed was to add this to %make:

export LDFLAGS="-Wl,-Bsymbolic %{?__global_ldflags}"

He also posted a scratch build of llvm, http://koji.fedoraproject.org/koji/taskinfo?taskID=9325345 - though I'm not sure what that does, I think the only practical help llvm could be here is if it stopped linking against libedit entirely.

I'll test building boot.isos with each of the scratch builds, anyway, and see what happens.

Comment 27 Adam Williamson 2015-03-25 23:26:29 UTC
ajax's libsolv build does not work for me; a boot.iso built with that still crashes.

However, Boris' libedit build does work! A boot.iso built with libedit-3.1-11.20141030cvs.fc23.x86_64.rpm does not crash for me.

Could you do an official build and update, Boris? Thanks a lot!

Comment 28 Boris Ranto 2015-03-26 00:01:01 UTC
Done, built for f22 as well as rawhide, I'll push the fedora update for f22 right away.

Comment 29 Fedora Update System 2015-03-26 00:09:24 UTC
libedit-3.1-11.20141030cvs.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/libedit-3.1-11.20141030cvs.fc22

Comment 30 Fedora Update System 2015-03-29 04:51:18 UTC
Package libedit-3.1-12.20150325cvs.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libedit-3.1-12.20150325cvs.fc22'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-4925/libedit-3.1-12.20150325cvs.fc22
then log in and leave karma (feedback).

Comment 31 Fedora Update System 2015-03-30 07:02:48 UTC
libedit-3.1-12.20150325cvs.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.