Description of problem: After I run /etc/cron.daily/prelink from a command line I see this: /etc/cron.daily/prelink: line 47: 3672 Aborted (core dumped) \ /usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink/prelink.log 2>&1 /var/log/prelink/prelink.log ends with lines like those: /usr/lib64/libexslt.so.0 0000003334100000-0000003334311708 Prelink failed with return value 134 although a library in question will vary from run to run. After loading prelink-debuginfo-0.3.8-1 and checking core (it is turned on) I see the following from gdb: Core was generated by `/usr/sbin/prelink -av -mR -f'. Program terminated with signal 6, Aborted. #0 0x0000000000466b25 in raise () (gdb) bt #0 0x0000000000466b25 in raise () #1 0x000000000043d0a0 in abort () #2 0x000000000040d3a2 in layout_libs () at layout.c:634 #3 0x000000000040e079 in main (argc=4, argv=0x7fffcbaf1418) at main.c:408 #4 0x0000000000436bb0 in __libc_start_main () #5 0x00000000004001b9 in _start () #6 0x00007fffcbaf1408 in ?? () #7 0x0000000000000000 in ?? () (gdb) list layout.c:634 629 for (j = 1; j < l.binlibs[i]->ndepends; ++j) 630 if (deps[j]->base 631 < ((deps[j - 1]->end + max_page_size - 1) 632 & ~(max_page_size - 1)) 633 && (deps[j]->type == ET_DYN || deps[j - 1]->type == ET_DYN)) 634 abort (); 635 } 636 #endif 637 } 638 I am afraid that most variables are unavailable (well, 'j' prints as 582 and 'l.nbinlibs' as 2911). Looking a bit more on listings the code in question is inside '#ifdef DEBUG_LAYOUT ... #endif'. 'package-cleanup --problems' from 'yum-utils' prints "No problems found". Version-Release number of selected component (if applicable): prelink-0.3.8-1 How reproducible: with my current set of libraries - always
same here
"same here" from comment #1 turns out to be i386 (athlon xp).
Can you please (with prelink-debuginfo installed): gdb --args /usr/sbin/prelink -avmRf break layout_libs run call prelink_entry_dump (prelink_filename_htab, "/tmp/cache.dump") quit and attach /tmp/cache.dump here? Thanks.
Created attachment 131819 [details] cache.dump from prelink Hm, after 'run' in gdb I am flooded with lines and lines of that sort: ..... Detaching after fork from child process 8808. Detaching after fork from child process 8809. Detaching after fork from child process 8810. Detaching after fork from child process 8811. Detaching after fork from child process 8812. ..... Normal? gdb stops after every screenfull and I started to wonder if I am not really running in a loop. In any case a requested file is attached.
Should be fixed in prelink-0.3.9-1 in rawhide.
*** Bug 198093 has been marked as a duplicate of this bug. ***
> Should be fixed in prelink-0.3.9-1 in rawhide. WORKSFORME with this version installed.
> WORKSFORME It looks like that I was too quick. This is what I found the second time, with prelink-0.3.9-1, in cron output: /etc/cron.daily/prelink: line 47: 28803 Segmentation fault /usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink/prelink.log 2>&1 The trouble is that repeated (the first line to make prelink really run): touch /var/lib/misc/prelink.force /etc/cron.daily/prelink runs every time without any visible trouble but cron somehow makes it unhappy.
Bother! I run /etc/cron.daily/prelink six times in a row from a root crontab, making sure that I run only one prelink process at a time and forcing actual runs, and no failures happened. OTOH what I quoted in comment #8 is, unfortunately, real.
Recent log entries: prelink[28803]: segfault at 0000000000000058 rip 0000000000400310 rsp 00007fff1cb05fc8 error 4 prelink[29948]: segfault at 0000000000000058 rip 0000000000400310 rsp 00007fff7349f768 error 4 prelink[14444]: segfault at 0000000000000058 rip 0000000000400310 rsp 00007fff0827b758 error 4 prelink[3370]: segfault at 0000000000000058 rip 0000000000400310 rsp 00007ffffa0669e8 error 4 This seems to happen only once a day (and possibly only on the first run after boot). The first three entries were triggered by anacron. The last one is from a "manual" run before anacron got to it. Any ideas?
So far I failed to repeat the error when running under gdb. OTOH gdb seems to be pointing here: (gdb) l *0x0000000000400310 0x400310 is in deps_cmp (cache.c:344). 339 if (a == NULL && b != NULL) 340 return 1; 341 if (a != NULL && b == NULL) 342 return -1; 343 344 if (a->type == ET_NONE && b->type != ET_NONE) 345 return 1; 346 if (a->type != ET_NONE && b->type == ET_NONE) 347 return -1; 348 Maybe there is indeed a situation possible when both a and b are NULL? This is not checked and line 344 then will bomb.
I managed to get the following backtrace from a hacked up /etc/cron.daily/prelink which runs things via gdb: Program received signal SIGTSTP, Stopped (user). 0x0000000000456a53 in __pread_nocancel () #0 0x0000000000456a53 in __pread_nocancel () #1 0x0000000000431d16 in elf64_getshdr () #2 0x0000000000431f0a in gelf_getshdr () #3 0x000000000041ba15 in fdopen_dso (fd=9, name=0x7049c0 "/usr/libexec/wnck-applet") at dso.c:355 #4 0x000000000040b8cc in gather_func ( name=0x7049c0 "/usr/libexec/wnck-applet", st=0x7fff27f7c080, type=<value optimized out>, ftwp=<value optimized out>) at gather.c:830 #5 0x00000000004573aa in process_entry () #6 0x000000000045788e in ftw_dir () #7 0x0000000000457fbd in ftw_startup () #8 0x000000000040bea0 in gather_object (name=0x6f5083 "/usr/libexec", deref=<value optimized out>, onefs=1) at gather.c:1005 #9 0x000000000040c290 in gather_config (config=<value optimized out>) #10 0x000000000040df75 in main (argc=4, argv=0x7fff27f7c868) at main.c:392 #11 0x0000000000436cb0 in __libc_start_main () #12 0x00000000004001b9 in _start () Does not seem to make much sense, and I am not aware of anything which would send SIGTSTP to that process, but that is all I have at this moment beyond bunch of these segfault log entries.
This is a backtrace I got from a modified cron job and this is clearly the same problem as in comment #10: Program received signal SIGSEGV, Segmentation fault. deps_cmp (A=0x7fffc500e458, B=0x7fffc500ebc8) at cache.c:344 344 if (a->type == ET_NONE && b->type != ET_NONE) #0 deps_cmp (A=0x7fffc500e458, B=0x7fffc500ebc8) at cache.c:344 #1 0x000000000043d58b in msort_with_tmp () #2 0x000000000043d4eb in msort_with_tmp () #3 0x000000000043d4eb in msort_with_tmp () #4 0x000000000043d4d5 in msort_with_tmp () #5 0x000000000043d702 in qsort () #6 0x0000000000400fbd in prelink_load_cache () at cache.c:465 #7 0x000000000040e0c0 in main (argc=4, argv=0x7fffc5012cf8) at main.c:390 #8 0x0000000000436cb0 in __libc_start_main () #9 0x00000000004001b9 in _start () It is rather elusive, though. When I tried to repeat running the same command under gdb from a command line it terminated normally. Yes, I see that prelink needs to be in a "quick" mode for the above trace to make sense. I still do not see what prevents 'a' and 'b' to be both NULL at the same time. It is clear by now that this is another problem than what was originally reported. Should I open another bugzilla entry and close this one?
This really differs from the bug in original report. Resubmitted as bug #200160, with a better description how to see it, and closing this bug again.