I want to apologize profusely to the QA guys at RH. This should have been an early test machine for 7.2, but due to it being my wife's mobile work-box.. I didnt want to endanger it. I have a Dell Inspiron 8000 with the ATI Rage Mobility M4 chipset. With 7.1, I had an occasional problem where when the X session ended the machine would go into a 'white screen' mode where everything was whited out and keyboard was dead. The machine could be ssh'd into but killing the X session wouldnt bring it back, and you could not always reboot it cleanly.. it would hang somewhere in the last processes. One could also replicate the problem by just CNTRL-ALT-F1 and the 'video switch' will bring up the white screen. With 7.2, this seems to be much much worse. I able able to end X safely, maybe 1 out 5 times without a white screen. I have been able to work around this twice by pressing the <Function> and <Font> key at the same time, but the rest of the times it is still unusable. I edited the /etc/X11/XF86config-4 file and turned off every option in the file i could (DRI, double-buffer, etc etc). This did not have any noticeable effect... I also tried changing to 8 bit and 24 bit mode. The 24 bit mode had the effect of the Xserver going into vertically split mode at 50% across the screen.. but otherwise did not effect the percentage of white screen crashes (~80%). Using different screen modes from 1280x1024 down to 640x480 also did not make it less likely to occur. At the moment, I am down to trying getting Frame Buffer going since I dont have this problem during an install or upgrade. I would also like any advice from Dell, XFree86, or kernel people on what I can try. I can tinker with the box a bit more at the moment.. since well until its fixed I am to live in the dog house. Further info: BIOS APM set to using 8 megs of ram. Dell Inspiron has 128 Megs Ram
>BIOS APM set to using 8 megs of ram. You mean AGP aperture? If so, the AGP aperture should be set to 1/2 of the system's RAM. So in your case 64Mb. Not likely the cause of the problem but... I assume you're using runlevel 3, but if not, can you try it? When this problem occurs, can you blindly type "startx" again, and see if X starts up ok? If it does, this is most likely related to the VTSWITCH bugs people have been experiencing.
8 MB was the amount of memory reported by the BIOS. It isnt settable so I am guessing this is the amount that the card has hard-wired to it. The keyboard rarely is operational when the problem occurs. I am going to make sure that HOT-Keys are enabled in the kernel so that I can try to get it out of RAW mode if that is what is happening. I did notice one thing.. it does seem to be in some weird video state.. at first you think you can see the X and some other things.. but after a while the video goes to either complete white or gray and white lines. Hmmm actually this looks to be kernel level. The box falls off the net when this occurs.. my ssh session is locked up and gone. The box seems to make one last sync of discs and that is it. [ALT-SYSRQ keys od not respond etc.] I am going to set up runlevel 4 to have no daemons (apm et al) and then boot into it. I will try startx there.. if it doesnt occur.. I will go into runlevel 3 which will have apmd et al on.. and then see if the problem occurs. It being late.. I will try it in the morning so I dont screw something up :)
Ok the problem really seems to be some interaction between the kernel and X. The box is wedging real hard and I cant see what is causing it.. I am going to see if I can bring it in so someone at RH can put a serial console to it. I will also upgrade the kernel to 2.4.9-X to see if that helps any.. (if I can get a download to work :)) Stephen
Ok I have updated to kernel-2.4.9-7 and XFree86-4.1.0-4. The machine continues to hard-lock at certain changes between X->non-X though with less frequency. I am going to downgrade the kernel from i686 version to i386 version and see if that helps any.. after that its serial console and looking for any tips that might help.
Created attachment 34901 [details] Working startx.out
Ok changing to i386 kernel seems to have helped (restarting and stopping X 5 times without a crash is a big bonus.) I am looking at differing messages in /var/log/messages from the agpgart module.. and am wondering if this might point to the issue. Here are i686 messages... messages:Oct 22 22:42:18 localhost kernel: Linux agpgart interface v0.99 (c) Jeff Hartmann messages:Oct 22 22:42:18 localhost kernel: agpgart: Maximum main memory to use for agp memory: 94M messages:Oct 22 22:42:18 localhost kernel: agpgart: agpgart: Detected an Intel i815, but could not find the secondary device. Assuming a non-integrated video card. messages:Oct 22 22:42:18 localhost kernel: agpgart: Detected Intel i815 chipset messages:Oct 22 22:42:18 localhost kernel: agpgart: AGP aperture is 64M @ xe4000000 messages:Oct 22 22:42:18 localhost kernel: agpgart: AGP mode is 1x Here are the i386 2.4.9-7 messages... messages:Oct 24 14:55:07 localhost kernel: Linux agpgart interface v0.99 (c) Jeff Hartmann messages:Oct 24 14:55:07 localhost kernel: agpgart: Maximum main memory to use for agp memory: 94M messages:Oct 24 14:55:07 localhost kernel: agpgart: agpgart: Detected an Intel i815, but could not find the secondary device. Assuming a non-integrated video card. messages:Oct 24 14:55:07 localhost kernel: agpgart: Detected Intel i815 chipset messages:Oct 24 14:55:07 localhost kernel: agpgart: AGP aperture is 64M @ 0xe4000000 The startx.out is included as an attachment.
That just means you have the i815 chipset but not its integrated video. If you arent already, try upgrading to our latest kernel. There was recently some kernel level console switching bugs fixed. If that doesn't do it, we'll have to explore something else.
Any word on if the new XFree86 + kernel update fixes this?
Have upgraded to latest errata i386 (versus 686) of kernel and glibc. I have also upgraded to latest XFree86. I have run for 8 hours and not had a lockup leaving X. I will pound on it for a couple more days and either give more info or close ticket. Or if adventurous... try moving it to 686 versions of kernel and glibc. One problem after updates... I have found is that even though I said in Xconfigurator that my inspirons LCD was Generic 1280x1024, X starts up and says it doesnt know what that is and goes to 1024x768.
Created attachment 46780 [details] Current startx.out
I have had lockups again with the system in the last 2 days. At first I thought it was sound oriented, but turning off all sound etc didnt help any. I can get the system to lockup by doing the following: Cause system to fsck. Log in as user Startx Turn to ALT-Fx or logout of KDE. Instant lockup. So the problem is something to do with kernel buffer cache? Thanks for bearing with me.
Ok, I found a way to not cause the white screen of death after an fsck on the Inspiron. startx opened up an xterm su - root /etc/init.d/apmd stop end X session system recovered normally 2 out of 2 times :). Hope this helps figure out where it might be crashing.
Wanted to update bug. The problem still occurs if the 'APM' on the system has been 'activated' (battery low, close case, manually put system into sleep.) If Dell has anything that I can try in the BIOS (update it?) I will glady do so.
Please look at bug #65136 and see if it sounds similar. The bugfix for that bug will be available soon. This bug should be closed as a duplicate of that bug if they are the same.
Ack. Well I thought it was closable. This bug should be considered 72672 now I think.
I have only had an occasional lockup after going to the 2.4.18-18.x kernel and 8.0. I am going to install 8.0.93 and see if the problem shows up there. If it does, then I will open a new bug report.