Description of Problem: X server is very slow to start. With RH7.0, the X server started fairly reliably within 5 seconds, using various XFree versions, from 4.01 to 4.03. With RH7.1, I'm seeing a fair percentage of times it takes longer than 30 seconds, and it usually takes more than 5 seconds. Version-Release number of selected component (if applicable): 4.0.3-21 (default from CDs and/or up2date) How Reproducible: It's always much slower than with RH7.0. At least 50% of the time it takes longer than 5 seconds. About 5-10% of the time it takes longer than 30 seconds. Steps to Reproduce: 1. log in 2. run startx 3. Actual Results: X server takes many seconds to start, often longer than 30 seconds. Expected Results: X server should start within a few seconds. (This is an Alpha 21264 with fast disks, etc.) Additional Information: Graphics cars is an Vanta TNT2 16MB, running 1600x1200 at 16 bit color. (I'd like to do 24-bit color, but that doesn't work.) I have a Permedia 3 Oxygen VX1 32MB card on order. I'll attach the X configuration file and a log file from an X server that took serveral seconds (but not longer than 30).
Created attachment 33418 [details] X configuration file
Created attachment 33419 [details] log file from a run that took several seconds (less than 30)
Unable to reproduce this on my Alpha here. Can you do an strace or otherwise determine where it is hanging? Is it the actual X server itself that is hanging, or is it the desktop stuff? Perhaps a DNS lookup gone bad?
I have reason to believe it is the X server itself rather than "desktop" stuff, mostly because I don't run any "desktop" stuff. The problem affects my users who use Gnome/KDE the same as they affect those who do not use Gnome/KDE. Also, at the point of hang, the X server has not caused the switch from VC1 to VC7, because the VC1 text-mode login sequence is still visible. One of my hunches is the hang is caused while BTTV-related kernel modules load during X server initialization. Where should the strace command be inserted? I tried inserting it where startx calls xinit, but that did not seem to produce any visible output. xinit seems to be a compiled binary rather than a script, which makes it difficult to insert an strace into it where it calls something else.
I instrumented the startx script with echo "..." statements at every non-trivial step, and put an strace in front of the xinit command. (After upgrading to kernel 2.4.9-12, teeing the output of startx to a file intermittently causes the server to not start up, but that's a separate issue.) It appears the delay is in the mcookie command. Further testing shows that reading just 4 bytes from /dev/random can take as long as (or maybe even longer than) 20 seconds. Was a change made to the kernel code behind /dev/random between RH7.0 and RH7.1 that would increase the likelihood of long delays in getting just a few random bits? This suggests a workaround of using "mcookie -f /dev/urandom", but that would decrease the security of the xauth stuff. Are there better ideas available? Thanks.
Actually the info you just provided makes a lot of sense. If the entropy pool is empty, reading /dev/random can take a while. If you move the mouse around, and hit random keys on the keyboard, does it speed things up? Arjan, do you have any comments perhaps to add here that might help shed some light?
Typing gibberish on the keyboard or moving the mouse (trackman marble, actually) around did the trick. I now have a "If you can see this, ..." message to instruct users what to do when they try to start X. Thank you for the solution/workaround. I find it highly amusing that you actually _CAN_ sometimes make a computer go faster by moving the cursor around or typing gibberish on the keyboard. At this point, I'm okay with closing this report.
As a workaround: if you can cause some extra disk io that will add to the secure entropy pool.... that can even be scripted ;)
Thanks for the suggestion of using scripted disk I/O to refill the entropy pool. It appears to be working quite nicely only taking a second or four to do the job, and that's with a pretty lame script. I'm okay with closing this report out, unless you want to leave it open to try to get more informaton about why this became an issue between 7.0 and 7.1.
Closing bug as NOTABUG, as it is merely an entropy issue. XFree86 requires entropy to be present and that means ensuring that activity occurs which generates enough. If stretched, I would put the job of working around the problem at the kernel to generate entropy from more places, however that isn't something likely to happen anytime soon or be a priority I doubt just for this issue. So the best workaround for the time being is to do as you've been, and endsure things that keep the entropy pool filled, occur.