Red Hat Bugzilla – Bug 128003
firstboot or rhgb hangs at gray screen
Last modified: 2007-11-30 17:10:46 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040625
Description of problem:
FC-devel on 20040714 has firstboot freezing at a gray screen for some
reason. 20040715 has the same problem. 20040713 does not have the problem.
Right now I'm filing against distribution because I'm not sure which
package is guilty. However, I intend to find out within the next few
Version-Release number of selected component (if applicable):
fc-devel snapshots on 20040714 and 20040715
Steps to Reproduce:
1. Install fc-devel from 20040714 or 20040715, with a Personal Desktop
install and the standard package selection.
2. After installation finishes, reboot.
3. Wait for firstboot to come up.
1. Repeat procedure A, except using FC 3 test1 or FC-devel from the
last several days (but *before* 20040714).
2. At firstboot, press Control-Alt-Backspace to kill the X server.
3. Log in as root. (There are no unprivileged user accounts at this
4. Do any other system setup stuff you feel like doing.
5. Edit /etc/yum.conf so it points to a 20040714 snapshot of fc-devel
(I imagine 20040715 would work too but I only tested 20040715 with
fresh installs, not upgrades via yum).
6. Use "yum upgrade" to bring the system up to 20040714.
Actual Results: After RHGB finishes its business, the X server stops
and starts again. There is a gray screen, and perhaps a little disk
activity, but nothing else happens even after waiting an hour or more
on 1.3-1.4GHz machines.
Expected Results: Firstboot should not stop at a gray screen and
should proceed as normal.
I'm reproducing this 100% of the time on both of the test machines
I've tried so far. Also, adding "selinux=0" to the kernel command line
does not make the problem go away.
If you kill the frozen firstboot, log in as root, and examine
/var/log/Xorg.0.log.old, then there are messages at the end of the log
that range from somewhat odd to totally bizarre, depending on the test
I erased the log messages from one of my test machines, but if it
doesn't take too long I may try to reproduce it again and I'll post
the messages to this bug. On the other test machine (using the "vesa"
driver on a VIA C3M266 motherboard's onboard video), there's this one
line at the end of the log:
AUDIT: Fri Jul 16 02:46:16 2004: 3645 X: client 4 rejected from local host
Right now I'm using ext2 filesystems on my test boxes, FWIW. Yesterday
I also reproduced this bug on xfs, so the type of filesystem isn't
making a difference.
I just tested again with kernel 2.6.7-1.478 rather than 1.486. The
weird Xorg.0.log.old message is gone now, but the problem remains. So
maybe the messages are harmless and completely unrelated.
Actually, AFAICT sometimes rhgb freezes with a gray screen when it
should quit, and sometimes it's firstboot that freezes right after it
starts up. (I could be misperceiving the whole situation, however!)
Anyway, it turns out that downgrading rhgb from 0.12.2-1 to 0.11.2-7
makes the problem go away...
Still happening with 2004-08-22 rawhide snapshot, although there's a
traceback first (I'll attach that at some point in the next 48 hours).
The traceback I'm getting is probably the same as bug 130567.
*** This bug has been marked as a duplicate of 129532 ***
My workaround of downgrading rhgb still works, but that doesn't affect
the traceback. So, the traceback I mentioned in comment #3 appears to
be unrelated to this bug.
do we have a bracket of what versions of rhgb do and do not work?
No idea, I have snapshot of rawhide from mid-july and so far
I have never been able to reproduce that problem. I have seen
an X server without any ouput on X86_64, but it wasn't specific
to the one running rhgb and after a bunch of nightly update failures.
Doing a diff of the source between the version 5 months ago and
last week show only:
- a change in gtk widget code to accomodate xinerama
which I doubt can produce that effect
- a patches for close on exec of a socket
- correctly unmounting the ramfs needed if exec'ing the X
- and a 0 -> NULL pointer cleanup fix.
I don't see how any of those can generate the stated result. And
I can't reproduce it to chase where this may occur.
err ... I have no snapshot of rawhide from mid-july ...
Will try to reproduce this tomorrow and check
Quoting from (my) comment #1:
> Anyway, it turns out that downgrading rhgb from 0.12.2-1 to 0.11.2-7
> makes the problem go away...
0.12.2-1 breaks, 0.11.2-7 works. Does that help?
BTW, 0.11.2-7 also happens to be the rhgb version from FC3 test1.
Okay, problem reproduced, from here it is now possible to detect
what change messed things up and get a proper fix,
The grey screen appears for 2 reasons in rawhide:
- firstboot can simply crash see #129532
- the patch supplied to fix xinerama handling #115209 (or more
precisely the part of the patch affecting splash.h and splash.c)
make firstboot hang
In the later case it's not obvious to find why the stck trace is not
[Switching to Thread -151071648 (LWP 3006)]
0x00d99782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#0 0x00d99782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x0068f23d in poll () from /lib/tls/libc.so.6
#2 0x0018ff13 in g_main_context_acquire () from /usr/lib/libglib-2.0.so.0
#3 0x0019022f in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#4 0x00eba1de in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#5 0x00282c16 in init_gtk ()
#6 0x00d1fcb4 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#7 0x00d210ae in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0
#8 0x00cdce6e in PyFunction_SetClosure () from
#9 0x00cc9617 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#10 0x00cd0dac in PyMethod_New () from /usr/lib/libpython2.3.so.1.0
#11 0x00cc9617 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#12 0x00d1b2a0 in PyEval_CallObjectWithKeywords ()
#13 0x00cccaa9 in PyInstance_New () from /usr/lib/libpython2.3.so.1.0
#14 0x00cc9617 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#15 0x00d1eccf in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#16 0x00d210ae in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0
#17 0x00d21372 in PyEval_EvalCode () from /usr/lib/libpython2.3.so.1.0
#18 0x00d3a8b7 in PyErr_Display () from /usr/lib/libpython2.3.so.1.0
#19 0x00d3b9e2 in PyRun_SimpleFileExFlags () from
#20 0x00d3ca34 in PyRun_AnyFileExFlags () from
---Type <return> to continue, or q <return> to quit---
#21 0x00d4172e in Py_Main () from /usr/lib/libpython2.3.so.1.0
#22 0x080485b2 in main ()
One simple fix is to just reverse that patch. A better way would
be to find exactly what in the patch makes the whole thing hang,
probably the window manager code of the patch. rhgb runs without a
window manager ... except when firstboot starts since firstboot
itself starts metacity.
Okay we now have a fix for this it's in
with that and a quick fix to #129532 from first boot (removing
mouse configuration), then I have firstboot back on today's rawhide.
I need to push this into rawhide, maybe over the week-end,
This has been pushed to Rawhide, this should be fixed there,