Bug 110665 - Prelinking appears to cause segmentation faults
Summary: Prelinking appears to cause segmentation faults
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: prelink
Version: 1
Hardware: athlon
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-11-22 15:22 UTC by Need Real Name
Modified: 2008-08-02 23:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-10-28 16:36:31 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
prelink log file (53.71 KB, text/plain)
2004-02-02 15:58 UTC, Joe Harrington
no flags Details

Description Need Real Name 2003-11-22 15:22:12 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031030 Epiphany/1.0.4

Description of problem:
I had suspected that prelinking was the culprit in the rash of
segmentation faults that I had been getting starting the morning after
I did a fresh, everything, install of Fedora Core 1.  So about a week
ago, I removed "prelink" from /etc/cron.daily.  The result: no more
segmentation faults.  This morning, I put "prelink" back in
/etc/cron.daily and went to bed.  When I got up, prelinking had run,
and I started to get the segmentation faults again.  I then read "man
prelink" and proceeded to run "/usr/sbin/prelink -au", with the result
that the apps that had been giving segmentation faults now worked
correctly :)
Another experience I had with this is that after prelink would run,
apps that had formerly worked perfectly were now producing
segmentation faults.  If I, immediately after getting a segmentation
fault, would reinstall the app (by RPM or Source, it didn't matter),
the app would promptly begin to not crash with segmentation faults
anymore.

Version-Release number of selected component (if applicable):
prelink-0.3.0-13

How reproducible:
Always

Steps to Reproduce:
1. let /etc/cron.daily/prelink run like it is supposed to
2. A few, not all, previously stable apps will begin to crash, with
the only error being, "segmentation fault"
    

Actual Results:  Previously stable apps would begin to produce
segmentation faults.

Expected Results:  I expected a prelinked app/library to manifest the
benefits of prelinking, one of which I understand has to do with apps
starting faster(?)

Additional info:

Full installation of Fedora Core 1
Package selection: Everything
Official Updates: All applied

I must add, that even though I had this problem this morning, with the
official updates installed, I didn't have any official updates
installed when I first became aware of this bug.

Comment 1 Jakub Jelinek 2003-11-22 22:15:07 UTC
Given that I cannot reproduce anything like this, FC1 prelinked for several weeks
works just fine for me, I'll need more details.
What exact applications segfault, what are exact steps of reproducing
it.
Can you pick one such segfaulting application where the segfaults
are reproduceable (ideally as small as possible and easily reproduceable),
pack it up with its dependencies when prelinked:
tar chjf prelinked.tar.bz2 `LD_WARN= LD_TRACE_PRELINKING=1 /the/program/in/question | awk '{print $3}'`
then prelink -ua, verify the segfault is gone and pack it up unprelinked:
tar chjf unprelinked.tar.bz2 `LD_WARN= LD_TRACE_PRELINKING=1 /the/program/in/question | awk '{print $3}'`
?
Also, can you get a backtrace of the segfaulted program you pick up?

Comment 2 Jakub Jelinek 2003-12-02 16:34:18 UTC
Without any further information there is nothing that can be done
about this.

Comment 3 Need Real Name 2003-12-04 19:10:02 UTC
Hello, Jakub :)
My sincerest apologies for not following up on this sooner, but I have
been having some serious issues with our computer that will result in
me getting a new motherboard soon.  As a result, our computer has been
in the shop for a week.  I'll get back on this bug this weekend, and
then I'll also give you any new information I have when I install the
new motherboard.  Be forewarned: any future delays on my part
concerning this issue are because my computer is down :)

Comment 4 Jim Wiedman 2004-01-05 13:50:47 UTC
I've also been getting some strange prelink errors.  My system
frequently crashes around 0400 (just when my daily crons are run). 
Some of the messages I've received include:

/etc/cron.daily/prelink: line 36: 28382 Segmentation fault     
/usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink.log 2>&1

and

/etc/cron.daily/prelink: line 36:  1382 Illegal instruction    
/usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink.log 2>&1

Running the prelink cron script manually restarts my X session.

Comment 5 Joe Harrington 2004-02-02 15:57:03 UTC
I have had library-related crashes of openoffice.org and galeon (from
fedora.us).  I also have a fully-updated FC1 dist with all packages,
on i386/athlon hardware.

Openoffice.org ran fine for a few months, then would not start at all,
segfaulting after emitting many library errors.  This happened the
morning after the recent glibc update, and lasted until recently (I
did not try every day).  Today I tried again, and it works again. 
Meanwhile, it worked fine after the glibc update on a laptop I have
(IBM T40, Pentium M processor) *until* today, when it stopped (it
worked yesterday).  On the laptop, I can get a window up with no file,
and it crashes on reading the file.  When it didn't work on the
athlon, it crashed on startup, file or no. Here are the ooffice errors
from the desktop (good thing I save 5000 lines in my gterms!):

% ooffice &
[9] 19763
% Starting OpenOffice.org ...
 
 
Fatal exception: Signal 11
Stack:
/usr/lib/openoffice/program/libsal.so.3[0x2b2a78]
/usr/lib/openoffice/program/libsal.so.3[0x2b2c05]
/usr/lib/openoffice/program/libsal.so.3[0x2b2cce]
/lib/tls/libpthread.so.0[0xcc30b8]
/usr/lib/openoffice/program/libvcl645li.so(_ZNK4Menu9ImplPaintEP6WindowtlP12MenuItemDatahb+0x887)[0x7cf9d9f]
/usr/lib/openoffice/program/libvcl645li.so(_ZN13MenuBarWindow5PaintERK9Rectangle+0x54)[0x7d01724]
/usr/lib/openoffice/program/libvcl645li.so(_ZN6Window13ImplCallPaintEPK6Regiont+0x3a3)[0x7d2b975]
/usr/lib/openoffice/program/libvcl645li.so(_ZN6Window13ImplCallPaintEPK6Regiont+0x465)[0x7d2ba37]
/usr/lib/openoffice/program/libvcl645li.so(_ZN6Window20ImplCallOverlapPaintEv+0x5f)[0x7d2bb51]
/usr/lib/openoffice/program/libvcl645li.so(_ZN6Window18ImplHandlePaintHdlEPv+0x2c)[0x7d2bbe6]
/usr/lib/openoffice/program/libvcl645li.so(_ZN6Window26LinkStubImplHandlePaintHdlEPvS0_+0x26)[0x7d2bbb2]
/usr/lib/openoffice/program/libvcl645li.so(_ZN5Timer7TimeoutEv+0x1f)[0x7c00a03]
/usr/lib/openoffice/program/libvcl645li.so(_Z21ImplTimerCallbackProcv+0x82)[0x7c0075a]
/usr/lib/openoffice/program/libvcl645li.so(_ZNK7SalData7TimeoutEv+0x12)[0x7dca242]
/usr/lib/openoffice/program/libvcl645li.so(_ZN7SalXLib12CheckTimeoutEb+0xd6)[0x7dc9d96]
/usr/lib/openoffice/program/libvcl645li.so(_ZN7SalXLib5YieldEh+0x2e4)[0x7dca088]/usr/lib/openoffice/program/libvcl645li.so(_ZN11SalInstance5YieldEh+0x34)[0x7dd3256]
/usr/lib/openoffice/program/libvcl645li.so(_ZN11Application5YieldEv+0x61)[0x7bfb0e7]
/usr/lib/openoffice/program/libvcl645li.so(_ZN11Application7ExecuteEv+0x35)[0x7bfaff9]
/usr/lib/openoffice/program/soffice.bin(_ZN7desktop7Desktop4MainEv+0x1ad1)[0x8065065]
/usr/lib/openoffice/program/libvcl645li.so(_Z6SVMainv+0x49)[0x7bffbf3]
/usr/lib/openoffice/program/libvcl645li.so(main+0x4c)[0x7dc8b58]
/lib/tls/libc.so.6(__libc_start_main+0xf0)[0xa66770]
/usr/lib/openoffice/program/soffice.bin(_ZN6Window11RequestHelpERK9HelpEvent+0x31)[0x805e42d]
 
[9]    Abort                         ooffice

Galeon prints so many mplayer error messages that I have all its
messages sent to /dev/null, so I don't have crash messages for it.  It
takes work to crash galeon, unlike ooffice.

I've attached prelink.log, which has some galeon-related errors and a
haunting emacs-related error at the end.  It used to have
ooffice-related errors in it, but it appears this valuable log file is
overwritten each night, so those are gone.

Regardless of what it says, the file 
/usr/lib/mozilla-1.4.1/libgtkembedmoz.so
exists:

-rwxr-xr-x    1 root     root       119124 Nov 14 11:03
/usr/lib/mozilla-1.4.1/libgtkembedmoz.so*

--jh--

Comment 6 Joe Harrington 2004-02-02 15:58:12 UTC
Created attachment 97409 [details]
prelink log file

Comment 7 Joe Harrington 2004-02-02 18:30:10 UTC
I was about to send off tarred versions of ooffice, but it worked as
root.  Then I moved my .openoffice directory and tried it fresh as me,
and it worked on both machines.  So, I guess this is an openoffice
problem, not a prelink problem.  Given that, I'll bet galeon is having
its own problem as well.  I'm still worried by the log messages, though.

--jh--


Comment 8 Joe Harrington 2005-09-29 17:24:11 UTC
A year and a half after last posting to this bug, I think I have some
understanding of what's going on.  It represents an inherent risk in prelinking.

Prior to my last posts, but unknown to me at the time, I had a slightly flaky
stick of memory.  If I read and wrote files, such as in copying or prelinking,
it made a one-bit error for about every 3 GB copied.  After I RMAd my memory and
completely reinstalled Fedora, all ran well until just recently.  Again, I had
some hardware problems, and again I get crashes, only this time with the entire
OS, not with applications.  The system just freezes, about once a day.  If I
move the disk to another machine and boot off it, the other machine gets the
freezes.  My other FC3 machines run fine.  All are updated, etc. etc.

I think that prelinking is amplifying hardware problems.  It reads the entire OS
once a night, and if there is any hardware issue at all that can result in a
very occasional flipped bit, it will eventually manifest itself in some critical
binary.  There really isn't any good way to find the affected files.  I've tried
rpm -Va, but the output is so voluminous and there are so many circumstances
where non-config files can change that it's easier just to completely reinstall
the machine.

An easy solution would be to turn off prelinking, as a precaution against a
potential future hardware issue.  But, I wonder whether there might be some way
for prelink, perhaps optionally, to check its work.  Perhaps it could try
prelinking each binary more than once, and compare the output.  Or, it might do
a memory integrity test before the start of each run, turning itself off
permanently and emailing root if it failed.  Or, such a system integrity test
might be done nightly as a separate job, similar to SMART testing.

Clearly, healthy systems have little to fear and much to gain from prelink. 
But, given how easy it is for intermittent memory problems to hide, sometimes
for years, some precaution against prelink's nightly copying and potential
contamination of the whole OS is in order.  I think that for simplicity's sake,
a test like memtest86+ but that runs on unoccupied memory, during idle moments,
and while booted, makes a lot of sense.  This isn't an enhancement request, it's
a request for a completely new component.  How do I enter that into bugzilla?

--jh--


Comment 9 Matthew Miller 2006-07-11 17:50:28 UTC
Fedora Core 1 is maintained by the Fedora Legacy project for security updates
only. If this problem is a security issue, please reopen and reassign to the
Fedora Legacy product. If it is not a security issue and hasn't been resolved in
the current FC5 updates or in the FC6 test release, reopen and change the
version to match.

Thanks!

NOTE: Fedora Core 1 is reaching the final end of support even by the Legacy
project. After Fedora Core 6 Test 2 is released (currently scheduled for July
26th), there will be no more security updates for FC1. Please use these next two
weeks to upgrade any remaining FC1 systems to a current release.



Comment 10 John Thacker 2006-10-28 16:36:31 UTC
Closing per lack of response to previous comment.  Note that FC1 and FC2 are no
longer
supported even by Fedora Legacy.  If this still occurs on FC3 or FC4, please
assign to that version and Fedora Legacy.  If it still occurs on FC5 or FC6,
please reopen and assign to the correct version.


Note You need to log in before you can comment on or make changes to this bug.