Description of Problem: X server is very slow to start. With RH7.0,
the X server started fairly reliably within 5 seconds, using various
XFree versions, from 4.01 to 4.03. With RH7.1, I'm seeing a fair
percentage of times it takes longer than 30 seconds,
and it usually takes more than 5 seconds.
Version-Release number of selected component (if applicable):
4.0.3-21 (default from CDs and/or up2date)
How Reproducible: It's always much slower than with RH7.0. At least
50% of the time it takes longer than 5 seconds. About 5-10% of the
time it takes longer than 30 seconds.
Steps to Reproduce:
1. log in
2. run startx
Actual Results: X server takes many seconds to start, often
longer than 30 seconds.
Expected Results: X server should start within a few seconds.
(This is an Alpha 21264 with fast disks, etc.)
Graphics cars is an Vanta TNT2 16MB, running 1600x1200 at 16 bit
color. (I'd like to do 24-bit color, but that doesn't work.) I
have a Permedia 3 Oxygen VX1 32MB card on order. I'll attach
the X configuration file and a log file from an X server that took
serveral seconds (but not longer than 30).
Created attachment 33418 [details]
X configuration file
Created attachment 33419 [details]
log file from a run that took several seconds (less than 30)
Unable to reproduce this on my Alpha here. Can you do an strace or
otherwise determine where it is hanging?
Is it the actual X server itself that is hanging, or is it the
desktop stuff? Perhaps a DNS lookup gone bad?
I have reason to believe it is the X server itself rather
than "desktop" stuff, mostly because I don't run any "desktop"
stuff. The problem affects my users who use Gnome/KDE the same
as they affect those who do not use Gnome/KDE. Also, at the
point of hang, the X server has not caused the switch from VC1
to VC7, because the VC1 text-mode login sequence is still visible.
One of my hunches is the hang is caused while BTTV-related kernel
modules load during X server initialization.
Where should the strace command be inserted? I tried inserting
it where startx calls xinit, but that did not seem to produce any
visible output. xinit seems to be a compiled binary rather than a
script, which makes it difficult to insert an strace into it where
it calls something else.
I instrumented the startx script with echo "..." statements
at every non-trivial step, and put an strace in front of the
xinit command. (After upgrading to kernel 2.4.9-12, teeing
the output of startx to a file intermittently causes the
server to not start up, but that's a separate issue.) It
appears the delay is in the mcookie command. Further testing
shows that reading just 4 bytes from /dev/random can take as
long as (or maybe even longer than) 20 seconds.
Was a change made to the kernel code behind /dev/random between
RH7.0 and RH7.1 that would increase the likelihood of long delays
in getting just a few random bits? This suggests a workaround of
using "mcookie -f /dev/urandom", but that would decrease the
security of the xauth stuff. Are there better ideas available?
Actually the info you just provided makes a lot of sense. If the
entropy pool is empty, reading /dev/random can take a while.
If you move the mouse around, and hit random keys on the keyboard,
does it speed things up?
Arjan, do you have any comments perhaps to add here that might help
shed some light?
Typing gibberish on the keyboard or moving the mouse
(trackman marble, actually) around did the trick. I
now have a "If you can see this, ..." message to
instruct users what to do when they try to start X.
Thank you for the solution/workaround. I find it highly
amusing that you actually _CAN_ sometimes make a computer
go faster by moving the cursor around or typing gibberish
on the keyboard.
At this point, I'm okay with closing this report.
As a workaround: if you can cause some extra disk io that will add to the secure
entropy pool.... that can even be scripted ;)
Thanks for the suggestion of using scripted disk I/O
to refill the entropy pool. It appears to be working
quite nicely only taking a second or four to do the
job, and that's with a pretty lame script.
I'm okay with closing this report out, unless you want
to leave it open to try to get more informaton about
why this became an issue between 7.0 and 7.1.
Closing bug as NOTABUG, as it is merely an entropy issue. XFree86 requires
entropy to be present and that means ensuring that activity occurs which
generates enough. If stretched, I would put the job of working around
the problem at the kernel to generate entropy from more places, however
that isn't something likely to happen anytime soon or be a priority I doubt
just for this issue. So the best workaround for the time being is to do
as you've been, and endsure things that keep the entropy pool filled, occur.