Bug 196941
Summary: | prelink aborts | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michal Jaegermann <michal> | ||||
Component: | prelink | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | rawhide | CC: | nicolas.mailhot, reuben-redhatbugzilla | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 0.3.9-1 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-07-25 20:27:47 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Michal Jaegermann
2006-06-27 18:58:37 UTC
same here "same here" from comment #1 turns out to be i386 (athlon xp). Can you please (with prelink-debuginfo installed): gdb --args /usr/sbin/prelink -avmRf break layout_libs run call prelink_entry_dump (prelink_filename_htab, "/tmp/cache.dump") quit and attach /tmp/cache.dump here? Thanks. Created attachment 131819 [details]
cache.dump from prelink
Hm, after 'run' in gdb I am flooded with lines and lines of that sort:
.....
Detaching after fork from child process 8808.
Detaching after fork from child process 8809.
Detaching after fork from child process 8810.
Detaching after fork from child process 8811.
Detaching after fork from child process 8812.
.....
Normal? gdb stops after every screenfull and I started to wonder if
I am not really running in a loop.
In any case a requested file is attached.
Should be fixed in prelink-0.3.9-1 in rawhide. *** Bug 198093 has been marked as a duplicate of this bug. *** > Should be fixed in prelink-0.3.9-1 in rawhide.
WORKSFORME with this version installed.
> WORKSFORME
It looks like that I was too quick. This is what I found the second
time, with prelink-0.3.9-1, in cron output:
/etc/cron.daily/prelink: line 47: 28803 Segmentation fault
/usr/sbin/prelink -av $PRELINK_OPTS >>/var/log/prelink/prelink.log 2>&1
The trouble is that repeated (the first line to make prelink really run):
touch /var/lib/misc/prelink.force
/etc/cron.daily/prelink
runs every time without any visible trouble but cron somehow makes it
unhappy.
Bother! I run /etc/cron.daily/prelink six times in a row from a root crontab, making sure that I run only one prelink process at a time and forcing actual runs, and no failures happened. OTOH what I quoted in comment #8 is, unfortunately, real. Recent log entries: prelink[28803]: segfault at 0000000000000058 rip 0000000000400310 rsp 00007fff1cb05fc8 error 4 prelink[29948]: segfault at 0000000000000058 rip 0000000000400310 rsp 00007fff7349f768 error 4 prelink[14444]: segfault at 0000000000000058 rip 0000000000400310 rsp 00007fff0827b758 error 4 prelink[3370]: segfault at 0000000000000058 rip 0000000000400310 rsp 00007ffffa0669e8 error 4 This seems to happen only once a day (and possibly only on the first run after boot). The first three entries were triggered by anacron. The last one is from a "manual" run before anacron got to it. Any ideas? So far I failed to repeat the error when running under gdb. OTOH gdb seems to be pointing here: (gdb) l *0x0000000000400310 0x400310 is in deps_cmp (cache.c:344). 339 if (a == NULL && b != NULL) 340 return 1; 341 if (a != NULL && b == NULL) 342 return -1; 343 344 if (a->type == ET_NONE && b->type != ET_NONE) 345 return 1; 346 if (a->type != ET_NONE && b->type == ET_NONE) 347 return -1; 348 Maybe there is indeed a situation possible when both a and b are NULL? This is not checked and line 344 then will bomb. I managed to get the following backtrace from a hacked up /etc/cron.daily/prelink which runs things via gdb: Program received signal SIGTSTP, Stopped (user). 0x0000000000456a53 in __pread_nocancel () #0 0x0000000000456a53 in __pread_nocancel () #1 0x0000000000431d16 in elf64_getshdr () #2 0x0000000000431f0a in gelf_getshdr () #3 0x000000000041ba15 in fdopen_dso (fd=9, name=0x7049c0 "/usr/libexec/wnck-applet") at dso.c:355 #4 0x000000000040b8cc in gather_func ( name=0x7049c0 "/usr/libexec/wnck-applet", st=0x7fff27f7c080, type=<value optimized out>, ftwp=<value optimized out>) at gather.c:830 #5 0x00000000004573aa in process_entry () #6 0x000000000045788e in ftw_dir () #7 0x0000000000457fbd in ftw_startup () #8 0x000000000040bea0 in gather_object (name=0x6f5083 "/usr/libexec", deref=<value optimized out>, onefs=1) at gather.c:1005 #9 0x000000000040c290 in gather_config (config=<value optimized out>) #10 0x000000000040df75 in main (argc=4, argv=0x7fff27f7c868) at main.c:392 #11 0x0000000000436cb0 in __libc_start_main () #12 0x00000000004001b9 in _start () Does not seem to make much sense, and I am not aware of anything which would send SIGTSTP to that process, but that is all I have at this moment beyond bunch of these segfault log entries. This is a backtrace I got from a modified cron job and this is clearly the same problem as in comment #10: Program received signal SIGSEGV, Segmentation fault. deps_cmp (A=0x7fffc500e458, B=0x7fffc500ebc8) at cache.c:344 344 if (a->type == ET_NONE && b->type != ET_NONE) #0 deps_cmp (A=0x7fffc500e458, B=0x7fffc500ebc8) at cache.c:344 #1 0x000000000043d58b in msort_with_tmp () #2 0x000000000043d4eb in msort_with_tmp () #3 0x000000000043d4eb in msort_with_tmp () #4 0x000000000043d4d5 in msort_with_tmp () #5 0x000000000043d702 in qsort () #6 0x0000000000400fbd in prelink_load_cache () at cache.c:465 #7 0x000000000040e0c0 in main (argc=4, argv=0x7fffc5012cf8) at main.c:390 #8 0x0000000000436cb0 in __libc_start_main () #9 0x00000000004001b9 in _start () It is rather elusive, though. When I tried to repeat running the same command under gdb from a command line it terminated normally. Yes, I see that prelink needs to be in a "quick" mode for the above trace to make sense. I still do not see what prevents 'a' and 'b' to be both NULL at the same time. It is clear by now that this is another problem than what was originally reported. Should I open another bugzilla entry and close this one? This really differs from the bug in original report. Resubmitted as bug #200160, with a better description how to see it, and closing this bug again. |