Bug 983319
Summary: | Install gets stuck returning from Installation Destination spoke | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | anaconda | Assignee: | David Shea <dshea> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | akozumpl, anaconda-maint-list, dshea, g.kaviyarasu, jonathan, kparal, mclasen, mkolman, pschindl, robatino, sbueno, stephent98, tflink, vanmeeuwen+fedora |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | AcceptedBlocker | ||
Fixed In Version: | anaconda-20.8-1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-08-29 12:38:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 980649 | ||
Attachments: |
Description
Adam Williamson
2013-07-11 00:26:35 UTC
Created attachment 771920 [details]
the 'dialog on black background' effect
Created attachment 771921 [details]
the buggy state: note all controls are inactive and the warning for Installation Destination is still displayed
Actually, doesn't look like I need to upload my live image: the bug is reproducible with the current nightly, which you can find at http://koji.fedoraproject.org/koji/taskinfo?taskID=5592193 . CCing mclasen for any possible GNOME/GTK+ angle on this (as mentioned above, I'll check with a KDE live later to see if that reproduces the issue or not). What info are you waiting for, exactly? I couldn't check KDE last week as KDE live image composes were failing; I'll check if that's still the case currently. Just for the record, KDE is now building again, but I can't investigate this further until https://bugzilla.redhat.com/show_bug.cgi?id=986069 is fixed. I believe the anaconda team has reproduced this and is working on it, so un-setting needinfo. This is confirmed not to be live-specific and I know the team has reproduced it. Not sure why needinfo seems to keep being set (or if I just forgot to unset it with c#6). *** This bug has been marked as a duplicate of bug 997149 *** Re-opening, as testing indicates that fixing 997149 has not fixed this problem. :( Discussed at 2013-08-21 blocker review meeting [1]. This is accepted as an Alpha blocker, because it violates the following F20 alpha release criterion: "The installer must be able to complete an installation using any supported locally connected storage interface." [2] [1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-08-21/ [2] https://fedoraproject.org/wiki/Fedora_20_Alpha_Release_Criteria#Storage_interfaces Just to confirm, as viking-ice asked in the blocker meeting, this bug affects bare metal too. Just tested on my old laptop and it's stuck at the hub screen right now. Created attachment 789668 [details]
strace before hitting 'reclaim space'
Created attachment 789669 [details]
strace after hitting 'reclaim space'
Note the end, where it does this forever:
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
rt_sigreturn() = 20651408
Created attachment 789671 [details]
lsof of the anaconda process after it gets stuck.
fd 25 seems to be eventfd, maybe related? Maybe not.
I've got no idea what's going on here -- except that there is a good probability that it isn't anaconda. We typically don't see SIGSEGV.
Created attachment 789696 [details]
backtrace
The versions of the packages and debuginfos don't necessarily correspond to anything in any install image, so I make no promises about the validity of those line numbers. The crash is happening in anaconda_lb_move_window_to_parent on this line:
if (!GTK_IS_WIDGET(parent) || !GTK_IS_WINDOW(window))
the parent looks fine, but the window pointer is pointing to garbage
*** Bug 998687 has been marked as a duplicate of this bug. *** You Python programmers are spoiled ... :-) anaconda_lb_move_window_to_parent() is in anaconda C code: https://git.fedorahosted.org/cgit/anaconda.git/tree/widgets/src/lightbox.c My money is on a tardy callback: $ less -N anaconda-20.6-1/widgets/src/lightbox.c ... 125 /* make the shade move with the parent window */ 126 g_signal_connect(window, "configure-event", 127 G_CALLBACK (anaconda_lb_move_window_to_parent), lightbox); ... [PATCH] Fix a SIGSEGV when returning from storage spoke (#983319) https://lists.fedorahosted.org/pipermail/anaconda-patches/2013-August/005533.html " ... then the signal handler (that takes the (destroyed) lightbox as a parameter) was called. - Kaboom." Thanks for the detailed explanation. That is what I guessed, but I never would have been able to figure out a fix ... :-) What puzzles me is why don't we see this in F19? lightbox.c is unchanged and utils.py is changed in places that don't seem to be related: $ diff -qs anaconda-19.30.13-1/widgets/src/lightbox.c anaconda-20.6-1/widgets/src/lightbox.c Files anaconda-19.30.13-1/widgets/src/lightbox.c and anaconda-20.6-1/widgets/src/lightbox.c are identical $ diff -qs anaconda-19.30.13-1/pyanaconda/ui/gui/utils.py anaconda-20.6-1/pyanaconda/ui/gui/utils.py Files anaconda-19.30.13-1/pyanaconda/ui/gui/utils.py and anaconda-20.6-1/pyanaconda/ui/gui/utils.py differ Created attachment 789743 [details]
diff between 19.30.13 and 20.6 pyanaconda/ui/gui/utils.py
$ diff -u anaconda-19.30.13-1/pyanaconda/ui/gui/utils.py anaconda-20.6-1/pyanaconda/ui/gui/utils.py
20.8 fixes this in my testing. Thanks! (In reply to Brian C. Lane from comment #14) > Created attachment 789669 [details] > strace after hitting 'reclaim space' > > Note the end, where it does this forever: > > --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} --- > rt_sigreturn() = 20651408 this could be caused by Anaconda trying to catch the sigsegv and write out some useful information about it via isys.handleSegv. That's a naive thing to try. Thanks for pointing that out, Ales. I was wondering about that SIGSEGV handler too. Do you think it could be removed? While investigating Bug 998687, I found that: 1. No message was written, because a SIGSEGV storm occurs. 2. Disabling the SIGSEGV handler was required to get a core dump. (Bug 998687, Comment 7, Step 3) Bug 998687 - SIGSEGV storm instead of Manual Partitioning (Marked as a dupe of this one ...) $ less -N anaconda-20.8-1/pyanaconda/isys/isys.c ... 219 static PyObject * doSegvHandler(PyObject *s, PyObject *args) { 220 void *array[20]; 221 size_t size; 222 char **strings; 223 size_t i; 224 225 signal(SIGSEGV, SIG_DFL); /* back to default */ 226 227 size = backtrace (array, 20); 228 strings = backtrace_symbols (array, size); 229 230 printf ("Anaconda received SIGSEGV!. Backtrace:\n"); 231 for (i = 0; i < size; i++) 232 printf ("%s\n", strings[i]); 233 234 free (strings); 235 exit(1); 236 } ... doSegvHandler() calls printf(), free(), and exit(), none of which are on the list of signal-safe functions: POSIX: 2.4 Signal Concepts 2.4.3 Signal Actions http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html#tag_02_04_03 SIGNAL(7) http://man7.org/linux/man-pages/man7/signal.7.html A SIGSEGV handler was installed in anaconda on 2006-02-23: - install SIGSEGV handler author Peter Jones <pjones> 2006-02-23 20:00:49 (GMT) https://git.fedorahosted.org/cgit/anaconda.git/commit/?id=a3f4e015863d8cff89844bb28e17fb52546e526e doSegvHandler() in that commit is not signal-safe, and it appears to have been unchanged since then: https://git.fedorahosted.org/cgit/anaconda.git/tree/isys/isys.c?id=a3f4e015863d8cff89844bb28e17fb52546e526e#n1439 Search results for SIGSEGV in commit messages: https://git.fedorahosted.org/cgit/anaconda.git/log/?qt=grep&q=SIGSEGV (In reply to Ales Kozumplik from comment #24) > (In reply to Brian C. Lane from comment #14) > > Created attachment 789669 [details] > > strace after hitting 'reclaim space' > > > > Note the end, where it does this forever: > > > > --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} --- > > rt_sigreturn() = 20651408 > > this could be caused by Anaconda trying to catch the sigsegv and write out > some useful information about it via isys.handleSegv. That's a naive thing > to try. Bug 1001187 - SIGSEGV signal handler calls functions that are not signal-safe (In reply to Steve Tyler from comment #25) > Thanks for pointing that out, Ales. I was wondering about that SIGSEGV > handler too. Do you think it could be removed? > It should, definitely. It was written for a different Anaconda:) (In reply to Ales Kozumplik from comment #29) > (In reply to Steve Tyler from comment #25) > > Thanks for pointing that out, Ales. I was wondering about that SIGSEGV > > handler too. Do you think it could be removed? > > > > It should, definitely. It was written for a different Anaconda:) Thanks, Ales. I've copied both your comments to Bug 1001187. I'm closing this bug as it seems to be fixed in TC2 (anaconda-20.9-1) |