196941 – prelink aborts

Bug 196941 - prelink aborts

Summary: prelink aborts

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	prelink
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	198093 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-06-27 18:58 UTC by Michal Jaegermann
Modified:	2012-03-08 07:44 UTC (History)
CC List:	2 users (show)
Fixed In Version:	0.3.9-1
Clone Of:
Environment:
Last Closed:	2006-07-25 20:27:47 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
cache.dump from prelink (506.64 KB, text/plain) 2006-06-30 16:21 UTC, Michal Jaegermann	no flags	Details
View All

Description Michal Jaegermann 2006-06-27 18:58:37 UTC

Description of problem:

After I run /etc/cron.daily/prelink from a command line I see this:

/etc/cron.daily/prelink: line 47:  3672 Aborted (core dumped) \
  /usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink/prelink.log 2>&1

/var/log/prelink/prelink.log ends with lines like those:

/usr/lib64/libexslt.so.0                0000003334100000-0000003334311708
Prelink failed with return value 134

although a library in question will vary from run to run.

After loading prelink-debuginfo-0.3.8-1 and checking core (it is
turned on) I see the following from gdb:

Core was generated by `/usr/sbin/prelink -av -mR -f'.
Program terminated with signal 6, Aborted.
#0  0x0000000000466b25 in raise ()
(gdb) bt
#0  0x0000000000466b25 in raise ()
#1  0x000000000043d0a0 in abort ()
#2  0x000000000040d3a2 in layout_libs () at layout.c:634
#3  0x000000000040e079 in main (argc=4, argv=0x7fffcbaf1418) at main.c:408
#4  0x0000000000436bb0 in __libc_start_main ()
#5  0x00000000004001b9 in _start ()
#6  0x00007fffcbaf1408 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) list layout.c:634
629               for (j = 1; j < l.binlibs[i]->ndepends; ++j)
630                 if (deps[j]->base
631                     < ((deps[j - 1]->end + max_page_size - 1)
632                        & ~(max_page_size - 1))
633                     && (deps[j]->type == ET_DYN || deps[j - 1]->type == ET_DYN))
634                   abort ();
635             }
636     #endif
637         }
638

I am afraid that most variables are unavailable (well, 'j' prints as 582
and 'l.nbinlibs' as 2911).  Looking a bit more on listings the code
in question is inside '#ifdef DEBUG_LAYOUT ... #endif'.

'package-cleanup --problems' from 'yum-utils' prints
"No problems found".


Version-Release number of selected component (if applicable):
prelink-0.3.8-1

How reproducible:
with my current set of libraries - always

Comment 1 Lars G 2006-06-28 15:52:34 UTC

same here

Comment 2 Michal Jaegermann 2006-06-28 21:23:48 UTC

"same here" from comment #1 turns out to be i386 (athlon xp).

Comment 3 Jakub Jelinek 2006-06-30 12:39:23 UTC

Can you please (with prelink-debuginfo installed):
gdb --args /usr/sbin/prelink -avmRf
break layout_libs
run
call prelink_entry_dump (prelink_filename_htab, "/tmp/cache.dump")
quit
and attach /tmp/cache.dump here?
Thanks.

Comment 4 Michal Jaegermann 2006-06-30 16:21:40 UTC

Created attachment 131819 [details]
cache.dump from prelink

Hm, after 'run' in gdb I am flooded with lines and lines of that sort:
.....
Detaching after fork from child process 8808.
Detaching after fork from child process 8809.
Detaching after fork from child process 8810.
Detaching after fork from child process 8811.
Detaching after fork from child process 8812.
.....

Normal? gdb stops after every screenfull and I started to wonder if
I am not really running in a loop.

In any case a requested file is attached.

Comment 5 Jakub Jelinek 2006-07-12 13:34:17 UTC

Should be fixed in prelink-0.3.9-1 in rawhide.

Comment 6 Jakub Jelinek 2006-07-12 13:37:49 UTC

*** Bug 198093 has been marked as a duplicate of this bug. ***

Comment 7 Michal Jaegermann 2006-07-14 16:43:08 UTC

> Should be fixed in prelink-0.3.9-1 in rawhide.

WORKSFORME with this version installed.

Comment 8 Michal Jaegermann 2006-07-18 19:14:01 UTC

> WORKSFORME

It looks like that I was too quick.  This is what I found the second
time, with prelink-0.3.9-1, in cron output:

/etc/cron.daily/prelink: line 47: 28803 Segmentation fault
 /usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink/prelink.log 2>&1

The trouble is that repeated (the first line to make prelink really run):

touch /var/lib/misc/prelink.force
/etc/cron.daily/prelink

runs every time without any visible trouble but cron somehow makes it
unhappy.

Comment 9 Michal Jaegermann 2006-07-18 20:17:54 UTC

Bother!  I run /etc/cron.daily/prelink six times in a row from a root
crontab, making sure that I run only one prelink process at a time and
forcing actual runs, and no failures happened.  OTOH what I quoted in
comment #8 is, unfortunately, real.

Comment 10 Michal Jaegermann 2006-07-21 15:40:21 UTC

Recent log entries:

prelink[28803]: segfault at 0000000000000058 rip 0000000000400310 rsp
00007fff1cb05fc8 error 4
prelink[29948]: segfault at 0000000000000058 rip 0000000000400310 rsp
00007fff7349f768 error 4
prelink[14444]: segfault at 0000000000000058 rip 0000000000400310 rsp
00007fff0827b758 error 4
prelink[3370]: segfault at 0000000000000058 rip 0000000000400310 rsp
00007ffffa0669e8 error 4

This seems to happen only once a day (and possibly only on the first
run after boot).  The first three entries were triggered by anacron.
The last one is from a "manual" run before anacron got to it.
Any ideas?

Comment 11 Michal Jaegermann 2006-07-21 19:40:59 UTC

So far I failed to repeat the error when running under gdb.  OTOH
gdb seems to be pointing here:

(gdb) l *0x0000000000400310
0x400310 is in deps_cmp (cache.c:344).
339       if (a == NULL && b != NULL)
340         return 1;
341       if (a != NULL && b == NULL)
342         return -1;
343
344       if (a->type == ET_NONE && b->type != ET_NONE)
345         return 1;
346       if (a->type != ET_NONE && b->type == ET_NONE)
347         return -1;
348

Maybe there is indeed a situation possible when both a and b are
NULL?  This is not checked and line 344 then will bomb.

Comment 12 Michal Jaegermann 2006-07-22 18:08:53 UTC

I managed to get the following backtrace from a hacked up
/etc/cron.daily/prelink which runs things via gdb:

Program received signal SIGTSTP, Stopped (user).
0x0000000000456a53 in __pread_nocancel ()
#0  0x0000000000456a53 in __pread_nocancel ()
#1  0x0000000000431d16 in elf64_getshdr ()
#2  0x0000000000431f0a in gelf_getshdr ()
#3  0x000000000041ba15 in fdopen_dso (fd=9,
    name=0x7049c0 "/usr/libexec/wnck-applet") at dso.c:355
#4  0x000000000040b8cc in gather_func (
    name=0x7049c0 "/usr/libexec/wnck-applet", st=0x7fff27f7c080,
    type=<value optimized out>, ftwp=<value optimized out>) at gather.c:830
#5  0x00000000004573aa in process_entry ()
#6  0x000000000045788e in ftw_dir ()
#7  0x0000000000457fbd in ftw_startup ()
#8  0x000000000040bea0 in gather_object (name=0x6f5083 "/usr/libexec",
    deref=<value optimized out>, onefs=1) at gather.c:1005
#9  0x000000000040c290 in gather_config (config=<value optimized out>)
#10 0x000000000040df75 in main (argc=4, argv=0x7fff27f7c868) at main.c:392
#11 0x0000000000436cb0 in __libc_start_main ()
#12 0x00000000004001b9 in _start ()

Does not seem to make much sense, and I am not aware of anything which
would send SIGTSTP to that process, but that is all I have at this
moment beyond bunch of these segfault log entries.

Comment 13 Michal Jaegermann 2006-07-24 18:26:29 UTC

This is a backtrace I got from a modified cron job and this is clearly
the same problem as in comment #10:

Program received signal SIGSEGV, Segmentation fault.
deps_cmp (A=0x7fffc500e458, B=0x7fffc500ebc8) at cache.c:344
344       if (a->type == ET_NONE && b->type != ET_NONE)
#0  deps_cmp (A=0x7fffc500e458, B=0x7fffc500ebc8) at cache.c:344
#1  0x000000000043d58b in msort_with_tmp ()
#2  0x000000000043d4eb in msort_with_tmp ()
#3  0x000000000043d4eb in msort_with_tmp ()
#4  0x000000000043d4d5 in msort_with_tmp ()
#5  0x000000000043d702 in qsort ()
#6  0x0000000000400fbd in prelink_load_cache () at cache.c:465
#7  0x000000000040e0c0 in main (argc=4, argv=0x7fffc5012cf8) at main.c:390
#8  0x0000000000436cb0 in __libc_start_main ()
#9  0x00000000004001b9 in _start ()

It is rather elusive, though.  When I tried to repeat running the same
command under gdb from a command line it terminated normally.  Yes, I
see that prelink needs to be in a "quick" mode for the above trace to
make sense.

I still do not see what prevents 'a' and 'b' to be both NULL at the
same time.

It is clear by now that this is another problem than what was originally
reported.  Should I open another bugzilla entry and close this one?

Comment 14 Michal Jaegermann 2006-07-25 20:27:47 UTC

This really differs from the bug in original report.  Resubmitted as
bug #200160, with a better description how to see it, and closing this
bug again.

Note You need to log in before you can comment on or make changes to this bug.