From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040625 Description of problem: FC-devel on 20040714 has firstboot freezing at a gray screen for some reason. 20040715 has the same problem. 20040713 does not have the problem. Right now I'm filing against distribution because I'm not sure which package is guilty. However, I intend to find out within the next few hours. Version-Release number of selected component (if applicable): fc-devel snapshots on 20040714 and 20040715 How reproducible: Always Steps to Reproduce: Procedure A: 1. Install fc-devel from 20040714 or 20040715, with a Personal Desktop install and the standard package selection. 2. After installation finishes, reboot. 3. Wait for firstboot to come up. Procedure B: 1. Repeat procedure A, except using FC 3 test1 or FC-devel from the last several days (but *before* 20040714). 2. At firstboot, press Control-Alt-Backspace to kill the X server. 3. Log in as root. (There are no unprivileged user accounts at this point.) 4. Do any other system setup stuff you feel like doing. 5. Edit /etc/yum.conf so it points to a 20040714 snapshot of fc-devel (I imagine 20040715 would work too but I only tested 20040715 with fresh installs, not upgrades via yum). 6. Use "yum upgrade" to bring the system up to 20040714. 7. Restart. Actual Results: After RHGB finishes its business, the X server stops and starts again. There is a gray screen, and perhaps a little disk activity, but nothing else happens even after waiting an hour or more on 1.3-1.4GHz machines. Expected Results: Firstboot should not stop at a gray screen and should proceed as normal. Additional info: I'm reproducing this 100% of the time on both of the test machines I've tried so far. Also, adding "selinux=0" to the kernel command line does not make the problem go away. If you kill the frozen firstboot, log in as root, and examine /var/log/Xorg.0.log.old, then there are messages at the end of the log that range from somewhat odd to totally bizarre, depending on the test machine's hardware. I erased the log messages from one of my test machines, but if it doesn't take too long I may try to reproduce it again and I'll post the messages to this bug. On the other test machine (using the "vesa" driver on a VIA C3M266 motherboard's onboard video), there's this one line at the end of the log: AUDIT: Fri Jul 16 02:46:16 2004: 3645 X: client 4 rejected from local host Right now I'm using ext2 filesystems on my test boxes, FWIW. Yesterday I also reproduced this bug on xfs, so the type of filesystem isn't making a difference. I just tested again with kernel 2.6.7-1.478 rather than 1.486. The weird Xorg.0.log.old message is gone now, but the problem remains. So maybe the messages are harmless and completely unrelated.
Actually, AFAICT sometimes rhgb freezes with a gray screen when it should quit, and sometimes it's firstboot that freezes right after it starts up. (I could be misperceiving the whole situation, however!) Anyway, it turns out that downgrading rhgb from 0.12.2-1 to 0.11.2-7 makes the problem go away...
Still happening with 2004-08-22 rawhide snapshot, although there's a traceback first (I'll attach that at some point in the next 48 hours).
The traceback I'm getting is probably the same as bug 130567.
*** This bug has been marked as a duplicate of 129532 ***
My workaround of downgrading rhgb still works, but that doesn't affect the traceback. So, the traceback I mentioned in comment #3 appears to be unrelated to this bug.
do we have a bracket of what versions of rhgb do and do not work?
No idea, I have snapshot of rawhide from mid-july and so far I have never been able to reproduce that problem. I have seen an X server without any ouput on X86_64, but it wasn't specific to the one running rhgb and after a bunch of nightly update failures. Doing a diff of the source between the version 5 months ago and last week show only: - a change in gtk widget code to accomodate xinerama which I doubt can produce that effect - a patches for close on exec of a socket - correctly unmounting the ramfs needed if exec'ing the X server failed - and a 0 -> NULL pointer cleanup fix. I don't see how any of those can generate the stated result. And I can't reproduce it to chase where this may occur. Daniel
err ... I have no snapshot of rawhide from mid-july ... Will try to reproduce this tomorrow and check
Quoting from (my) comment #1: > Anyway, it turns out that downgrading rhgb from 0.12.2-1 to 0.11.2-7 > makes the problem go away... 0.12.2-1 breaks, 0.11.2-7 works. Does that help?
BTW, 0.11.2-7 also happens to be the rhgb version from FC3 test1.
Okay, problem reproduced, from here it is now possible to detect what change messed things up and get a proper fix, Daniel
The grey screen appears for 2 reasons in rawhide: - firstboot can simply crash see #129532 - the patch supplied to fix xinerama handling #115209 (or more precisely the part of the patch affecting splash.h and splash.c) make firstboot hang In the later case it's not obvious to find why the stck trace is not very clear: [Switching to Thread -151071648 (LWP 3006)] 0x00d99782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) where #0 0x00d99782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x0068f23d in poll () from /lib/tls/libc.so.6 #2 0x0018ff13 in g_main_context_acquire () from /usr/lib/libglib-2.0.so.0 #3 0x0019022f in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 #4 0x00eba1de in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0 #5 0x00282c16 in init_gtk () from /usr/lib/python2.3/site-packages/gtk-2.0/gtk/_gtk.so #6 0x00d1fcb4 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0 #7 0x00d210ae in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0 #8 0x00cdce6e in PyFunction_SetClosure () from /usr/lib/libpython2.3.so.1.0 #9 0x00cc9617 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0 #10 0x00cd0dac in PyMethod_New () from /usr/lib/libpython2.3.so.1.0 #11 0x00cc9617 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0 #12 0x00d1b2a0 in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.3.so.1.0 #13 0x00cccaa9 in PyInstance_New () from /usr/lib/libpython2.3.so.1.0 #14 0x00cc9617 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0 #15 0x00d1eccf in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0 #16 0x00d210ae in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0 #17 0x00d21372 in PyEval_EvalCode () from /usr/lib/libpython2.3.so.1.0 #18 0x00d3a8b7 in PyErr_Display () from /usr/lib/libpython2.3.so.1.0 #19 0x00d3b9e2 in PyRun_SimpleFileExFlags () from /usr/lib/libpython2.3.so.1.0 #20 0x00d3ca34 in PyRun_AnyFileExFlags () from /usr/lib/libpython2.3.so.1.0 ---Type <return> to continue, or q <return> to quit--- #21 0x00d4172e in Py_Main () from /usr/lib/libpython2.3.so.1.0 #22 0x080485b2 in main () (gdb) One simple fix is to just reverse that patch. A better way would be to find exactly what in the patch makes the whole thing hang, probably the window manager code of the patch. rhgb runs without a window manager ... except when firstboot starts since firstboot itself starts metacity. Daniel
Okay we now have a fix for this it's in http://people.redhat.com/veillard/testing/SRPMS/rhgb-0.12.5-1.src.rpm with that and a quick fix to #129532 from first boot (removing mouse configuration), then I have firstboot back on today's rawhide. I need to push this into rawhide, maybe over the week-end, Daniel
This has been pushed to Rawhide, this should be fixed there, thanks, Daniel