Doing an "console=ttyS0 vnc" install using netinst.iso. x86_64 KVM guest on recent rawhide x86_64 host. Click Next on the hostname screen and it hangs Log just says "moving (1) to step timezone" strace shows the parent process doing this over and over: --- SIGSEGV (Segmentation fault) @ 0 (0) --- rt_sigaction(SIGSEGV, {0x711f70, [], SA_RESTORER, 0x11f690}, {0x711f70, [], SA_RESTORER, 0x11f690}, 8) = 0 rt_sigreturn(0xb) = 140591843010560 and a child process doing this over and over: ppoll([{fd=0, events=POLLIN|POLLPRI|POLLERR|POLLHUP|0x400}], 1, {18446744073709551615, 18446744073709551615}, ~[HUP INT TERM RTMIN], 8) = -1 EINVAL (Invalid argument) recvfrom(0, 0x7fff32b4e960, 8988, 64, 0x7fff32b4e7c0, 0x7fff32b4e7cc) = -1 EAGAIN (Resource temporarily unavailable)
*** Bug 482950 has been marked as a duplicate of this bug. ***
kvm is the keyword here, there are know issues with kvm and Fedora >= 10, try adding "clocksource=acpi_pm" to the kernel cmdline (pass -x "clocksource=acpi_pm" to virt-install) Given that the known problems are clocksource related (and include hard hangs of the virtual machine), I think this has the same cause. Therefore I suggest closing this as duplicate of the kvm bug tracking this.
(In reply to comment #2) > kvm is the keyword here, there are know issues with kvm and Fedora >= 10, try > adding "clocksource=acpi_pm" to the kernel cmdline (pass -x > "clocksource=acpi_pm" to virt-install) > > Given that the known problems are clocksource related (and include hard hangs > of the virtual machine), I think this has the same cause. I have just reproduced the issue with clocksource=acpi_pm, so it's not the same as bug #475598
Created attachment 330370 [details] 1411.log Okay, here's an strace log of the parent process from before the welcome screen The log of the child process is just 8Mb of the poll() = EINVAL, recvfrom() = EAGAIN loop, so I'm not bothering attaching that
I've reproduced this on two bare-metal machines (i386 and ppc64), with local graphics and VNC. It's not KVM.
FWIW, the log shows anaconda talking to dbus then trying to load anaconda.mo (it wants to print an error maybe?) and then segvs: 1233244607.898335 select(19, [18], [18], NULL, NULL) = 1 (out [18]) 1233244607.898500 writev(18, [{"%\4\1\0"..., 4}, {NULL, 0}, {""..., 0}], 3) = 4 1233244607.898882 read(18, 0x2570cc4, 4096) = -1 EAGAIN (Resource temporarily unavailable) 1233244607.899114 gettimeofday({1233244607, 899150}, NULL) = 0 1233244607.900480 stat("/usr/share/locale/en_US.UTF8/LC_MESSAGES/anaconda.mo", 0x7fff3b5d52a0) = -1 ENOENT (No such file or directory) 1233244607.900738 stat("/usr/share/locale/en_US/LC_MESSAGES/anaconda.mo", 0x7fff3b5d52a0) = -1 ENOENT (No such file or directory) 1233244607.900962 stat("/usr/share/locale/en.UTF8/LC_MESSAGES/anaconda.mo", 0x7fff3b5d52a0) = -1 ENOENT (No such file or directory) 1233244607.901270 stat("/usr/share/locale/en/LC_MESSAGES/anaconda.mo", 0x7fff3b5d52a0) = -1 ENOENT (No such file or directory) 1233244607.903095 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
There's no such file as /usr/share/local/en*/LC_MESSAGES/anaconda.mo. I wonder what happens if you were to try installation in another language. The closest I can think of would be to boot up with lang=en_GB, unless you can speak another language.
Reading through TimezoneWindow.__init__() and comparing that to the strace, I believe those lines searching for anaconda.mo correspond to: ics.setTitle(_("Time Zone Selection")) which has been there, unchanged, since 2002. So that's probably not the problem. My guess would be that we're dying in whatever's supposed to happen after iw/timezone_gui.py's TimezoneWindow() object is instantiated. I'm having trouble tracing how the dispatcher and things work, so I'm not sure where that's happening..
i686 install. I have 2 different encrypted partitions, both have the same password. The installer asks me for the pword for my first encrypted partition. If I give it the password and hit enter, it ask for the password for the next encrypted partition (like it should). If I give it the password and check the "This is a global password" check box, it hangs. The screen stops redrawing, if I drop out to shell top tells me there are two anaconda python processes each spinning as much cpu time as they can. With what little visability that I have it appears like the same results I saw on the tz issue. (which I got around on another machine by using kickstart)
Adding in a pdb.set_trace() and stepping through, I have found the problem block of code: self.utcCheckbox.set_active(asUTC) if not anaconda.isKickstart: self.utcCheckbox.set_active(not hasWindows(anaconda.id.bootloader)) The first set_active succeeds in the default interactive case, since asUTC is set to 0. On the second set_active call, I have no Windows partitions so we try to set the checkbox to active. That call then never returns. In addition, I have verified that hasWindows(anaconda.id.bootloader) returns in a timely fashion. So it seems that self.utcCheckbox.set_active(True) is leading to an endless stream of segfaults.
(In reply to comment #10) > Adding in a pdb.set_trace() and stepping through, I have found the problem > block of code: > > self.utcCheckbox.set_active(asUTC) > > if not anaconda.isKickstart: > self.utcCheckbox.set_active(not hasWindows(anaconda.id.bootloader)) > > The first set_active succeeds in the default interactive case, since asUTC is > set to 0. On the second set_active call, I have no Windows partitions so we > try to set the checkbox to active. That call then never returns. > > In addition, I have verified that hasWindows(anaconda.id.bootloader) returns in > a timely fashion. So it seems that self.utcCheckbox.set_active(True) is > leading to an endless stream of segfaults. Interesting, in comment #9, eparis claims a similar hang when trying to click a checkbox in the encrypted partition password dialog, so it seems we are hitting a pygtk or gtk bug when a checkbox gets set (be it from code or with the mouse).
Hey guys, I think you missed a good theory that Mark pointed out in comment 6. Note the crash happens after sending a message to the bus. There are dbus apis for construction messages that are variadic and easy to screw up. If you don't terminate them properly, for instance, you could hose your stack and end up with a random crash shortly after triggered by some unrelated action that uses the stack. Is there any C glue code in anaconda that uses dbus?
Actually, looking closer, Mark, are you sure fd 18 is dbus? Looking at the trace, 7 seems to be the bus connection. The stuff going over the wire for 18 doesn't match the bus wireformat and libdbus doesn't ever pass 4 io_vec's to it's writev calls. Would be useful to see what fd 18 is.
it seems unlikely to me that this is dbus related, given comment #11 (which references comment #9). Also, there's nothing going on when moving to the timezone screen that involves dbus. The only dbus-aware parts of anaconda are the networking and disk discovery code.
Still, much more likely some memory corruption or something than bug-in-GtkCheckButton
Strangely, we are not seeing this bug in Rawhide trees composed after our initial Alpha freeze. Whether this is due to the package set, the composition location, or some transient problem I do not know. However, since it looks like we are going to move the alpha on to a tree without this bug, I'm going to go ahead and close it as RAWHIDE. If we do continue to see it in more nightly trees, I guess we can go back to panicking.