Bug 110665
Summary: | Prelinking appears to cause segmentation faults | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Need Real Name <spu> | ||||
Component: | prelink | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 1 | CC: | djuran, jhmail, jim, mattdm | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | athlon | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-10-28 16:36:31 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Need Real Name
2003-11-22 15:22:12 UTC
Given that I cannot reproduce anything like this, FC1 prelinked for several weeks works just fine for me, I'll need more details. What exact applications segfault, what are exact steps of reproducing it. Can you pick one such segfaulting application where the segfaults are reproduceable (ideally as small as possible and easily reproduceable), pack it up with its dependencies when prelinked: tar chjf prelinked.tar.bz2 `LD_WARN= LD_TRACE_PRELINKING=1 /the/program/in/question | awk '{print $3}'` then prelink -ua, verify the segfault is gone and pack it up unprelinked: tar chjf unprelinked.tar.bz2 `LD_WARN= LD_TRACE_PRELINKING=1 /the/program/in/question | awk '{print $3}'` ? Also, can you get a backtrace of the segfaulted program you pick up? Without any further information there is nothing that can be done about this. Hello, Jakub :) My sincerest apologies for not following up on this sooner, but I have been having some serious issues with our computer that will result in me getting a new motherboard soon. As a result, our computer has been in the shop for a week. I'll get back on this bug this weekend, and then I'll also give you any new information I have when I install the new motherboard. Be forewarned: any future delays on my part concerning this issue are because my computer is down :) I've also been getting some strange prelink errors. My system frequently crashes around 0400 (just when my daily crons are run). Some of the messages I've received include: /etc/cron.daily/prelink: line 36: 28382 Segmentation fault /usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink.log 2>&1 and /etc/cron.daily/prelink: line 36: 1382 Illegal instruction /usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink.log 2>&1 Running the prelink cron script manually restarts my X session. I have had library-related crashes of openoffice.org and galeon (from fedora.us). I also have a fully-updated FC1 dist with all packages, on i386/athlon hardware. Openoffice.org ran fine for a few months, then would not start at all, segfaulting after emitting many library errors. This happened the morning after the recent glibc update, and lasted until recently (I did not try every day). Today I tried again, and it works again. Meanwhile, it worked fine after the glibc update on a laptop I have (IBM T40, Pentium M processor) *until* today, when it stopped (it worked yesterday). On the laptop, I can get a window up with no file, and it crashes on reading the file. When it didn't work on the athlon, it crashed on startup, file or no. Here are the ooffice errors from the desktop (good thing I save 5000 lines in my gterms!): % ooffice & [9] 19763 % Starting OpenOffice.org ... Fatal exception: Signal 11 Stack: /usr/lib/openoffice/program/libsal.so.3[0x2b2a78] /usr/lib/openoffice/program/libsal.so.3[0x2b2c05] /usr/lib/openoffice/program/libsal.so.3[0x2b2cce] /lib/tls/libpthread.so.0[0xcc30b8] /usr/lib/openoffice/program/libvcl645li.so(_ZNK4Menu9ImplPaintEP6WindowtlP12MenuItemDatahb+0x887)[0x7cf9d9f] /usr/lib/openoffice/program/libvcl645li.so(_ZN13MenuBarWindow5PaintERK9Rectangle+0x54)[0x7d01724] /usr/lib/openoffice/program/libvcl645li.so(_ZN6Window13ImplCallPaintEPK6Regiont+0x3a3)[0x7d2b975] /usr/lib/openoffice/program/libvcl645li.so(_ZN6Window13ImplCallPaintEPK6Regiont+0x465)[0x7d2ba37] /usr/lib/openoffice/program/libvcl645li.so(_ZN6Window20ImplCallOverlapPaintEv+0x5f)[0x7d2bb51] /usr/lib/openoffice/program/libvcl645li.so(_ZN6Window18ImplHandlePaintHdlEPv+0x2c)[0x7d2bbe6] /usr/lib/openoffice/program/libvcl645li.so(_ZN6Window26LinkStubImplHandlePaintHdlEPvS0_+0x26)[0x7d2bbb2] /usr/lib/openoffice/program/libvcl645li.so(_ZN5Timer7TimeoutEv+0x1f)[0x7c00a03] /usr/lib/openoffice/program/libvcl645li.so(_Z21ImplTimerCallbackProcv+0x82)[0x7c0075a] /usr/lib/openoffice/program/libvcl645li.so(_ZNK7SalData7TimeoutEv+0x12)[0x7dca242] /usr/lib/openoffice/program/libvcl645li.so(_ZN7SalXLib12CheckTimeoutEb+0xd6)[0x7dc9d96] /usr/lib/openoffice/program/libvcl645li.so(_ZN7SalXLib5YieldEh+0x2e4)[0x7dca088]/usr/lib/openoffice/program/libvcl645li.so(_ZN11SalInstance5YieldEh+0x34)[0x7dd3256] /usr/lib/openoffice/program/libvcl645li.so(_ZN11Application5YieldEv+0x61)[0x7bfb0e7] /usr/lib/openoffice/program/libvcl645li.so(_ZN11Application7ExecuteEv+0x35)[0x7bfaff9] /usr/lib/openoffice/program/soffice.bin(_ZN7desktop7Desktop4MainEv+0x1ad1)[0x8065065] /usr/lib/openoffice/program/libvcl645li.so(_Z6SVMainv+0x49)[0x7bffbf3] /usr/lib/openoffice/program/libvcl645li.so(main+0x4c)[0x7dc8b58] /lib/tls/libc.so.6(__libc_start_main+0xf0)[0xa66770] /usr/lib/openoffice/program/soffice.bin(_ZN6Window11RequestHelpERK9HelpEvent+0x31)[0x805e42d] [9] Abort ooffice Galeon prints so many mplayer error messages that I have all its messages sent to /dev/null, so I don't have crash messages for it. It takes work to crash galeon, unlike ooffice. I've attached prelink.log, which has some galeon-related errors and a haunting emacs-related error at the end. It used to have ooffice-related errors in it, but it appears this valuable log file is overwritten each night, so those are gone. Regardless of what it says, the file /usr/lib/mozilla-1.4.1/libgtkembedmoz.so exists: -rwxr-xr-x 1 root root 119124 Nov 14 11:03 /usr/lib/mozilla-1.4.1/libgtkembedmoz.so* --jh-- Created attachment 97409 [details]
prelink log file
I was about to send off tarred versions of ooffice, but it worked as root. Then I moved my .openoffice directory and tried it fresh as me, and it worked on both machines. So, I guess this is an openoffice problem, not a prelink problem. Given that, I'll bet galeon is having its own problem as well. I'm still worried by the log messages, though. --jh-- A year and a half after last posting to this bug, I think I have some understanding of what's going on. It represents an inherent risk in prelinking. Prior to my last posts, but unknown to me at the time, I had a slightly flaky stick of memory. If I read and wrote files, such as in copying or prelinking, it made a one-bit error for about every 3 GB copied. After I RMAd my memory and completely reinstalled Fedora, all ran well until just recently. Again, I had some hardware problems, and again I get crashes, only this time with the entire OS, not with applications. The system just freezes, about once a day. If I move the disk to another machine and boot off it, the other machine gets the freezes. My other FC3 machines run fine. All are updated, etc. etc. I think that prelinking is amplifying hardware problems. It reads the entire OS once a night, and if there is any hardware issue at all that can result in a very occasional flipped bit, it will eventually manifest itself in some critical binary. There really isn't any good way to find the affected files. I've tried rpm -Va, but the output is so voluminous and there are so many circumstances where non-config files can change that it's easier just to completely reinstall the machine. An easy solution would be to turn off prelinking, as a precaution against a potential future hardware issue. But, I wonder whether there might be some way for prelink, perhaps optionally, to check its work. Perhaps it could try prelinking each binary more than once, and compare the output. Or, it might do a memory integrity test before the start of each run, turning itself off permanently and emailing root if it failed. Or, such a system integrity test might be done nightly as a separate job, similar to SMART testing. Clearly, healthy systems have little to fear and much to gain from prelink. But, given how easy it is for intermittent memory problems to hide, sometimes for years, some precaution against prelink's nightly copying and potential contamination of the whole OS is in order. I think that for simplicity's sake, a test like memtest86+ but that runs on unoccupied memory, during idle moments, and while booted, makes a lot of sense. This isn't an enhancement request, it's a request for a completely new component. How do I enter that into bugzilla? --jh-- Fedora Core 1 is maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC5 updates or in the FC6 test release, reopen and change the version to match. Thanks! NOTE: Fedora Core 1 is reaching the final end of support even by the Legacy project. After Fedora Core 6 Test 2 is released (currently scheduled for July 26th), there will be no more security updates for FC1. Please use these next two weeks to upgrade any remaining FC1 systems to a current release. Closing per lack of response to previous comment. Note that FC1 and FC2 are no longer supported even by Fedora Legacy. If this still occurs on FC3 or FC4, please assign to that version and Fedora Legacy. If it still occurs on FC5 or FC6, please reopen and assign to the correct version. |