After a fresh install of Fedora Core 2 on my Dell Dimension XPS T500, the system refuses to boot. This system has never run FC2 -- I installed it on a new hard drive and still (thankfully) have my old Red Hat 8 on the previous hard drive which I boot into to actually get use my system. I've done alot of debugging already, so bear with me. The grub phase of the boot seems to work just fine. The grub menu comes up and I can select the kernel to boot. The screen then blanks out and I get this (note this is from a custom build of the kernel that I'll discuss below): Booting 'Fedora Core (2.6.8-1.521custom)' root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel /vmlinuz-2.6.8-1.521custom ro root=LABEL=/1 rhgb quiet [Linux-bzImage, setup=0x1400, size=0x1299b6] initrd /initrd-2.6.8-1.521custom.img [Linux-initrd @ 0x40c0000, 0x2d3d8 bytes] And that is where it sits. (Yes, label '/1' is correct -- my redhat 8 is the hard drive is where the ISOs were from which installed and it has partitions labeled / and /boot -- and thus anaconda used /1 and /boot.) I have a friend with an XPS T600 (which should only differ by the clock speed of the CPU in the system) who has absolutely no problems with FC2. I compared bios revision levels (and actually downgraded to match his) and even made every setting in my BIOS match his to a T. I have remove all unecessary devices (sound card, modem, cdrom drives) and the bootup is the same. I have pulled out all three sticks of my memory and tried one at a time with no change. I upgraded to the 2.6.8 released kernel from Fedora and no "real" change to the problem. At this point I was able to actually get it to boot by being patient and letting it sit there a LONG time. By LONG I mean anywhere from 15 minutes to an hour (I'm not sure how long it took as I was away both times when it booted). Both times it was when I manually typed in the commands at the grub command prompt -- but I don't know if that has any real bearing as I've done that and had it sit there for an hour without booting. I ran across another person who was having long timeouts of 1.5 minutes at this SAME point and he fixed his by recompiling the kernel with EDD disabled (CONFIG_EDD=n) -- thus I naturally tried that (hence the custom kernel). Thinking that it might POSSIBLY be grub, I installed LILO on the system and dumped it onto the MBR. When I hit enter at the LILO prompt (man do I miss the old LILO days -- more sentimentality than usability, grub is better in that sense :) I get "Loading Linux", then some "." come across the screen and then it too stops. In neither case do I EVER get "Uncompressing Kernel", or "Loading zImage", or whatever the hell you're suppose to get. The strange thing is that I grabbed a random "old" XPS R450 (PII 450) from the office, threw the hard drive in there and it gave the same results hanging at the same place. I found this odd, so I moved the hard drive into my P4 system (home grown with Intel mobo) and it booted up just fine. I sent the hard drive home with my coworker with the XPS T600 system and it booted up just fine, no problems. The 2.4 kernels appear to have no problems on this system, I'm going to keep proding but would LOVE anyone elses thoughts and ideas on things to try to beat this bug. Thanks!
Is there any way to turn on REALLY REALLY REALLY early debug information (even if I have to compile it in -- or worst, add the code in myself...) Even if it is as simple as: printf ("1\n"); ... printf ("2\n");
the first thing to do is remove the quiet flag from the kernel commandline. That tells the kernel to not print anything...
Okay, so I went in and did some tweaking. I went into the kernel and configured it down to the barest possible configuration. I will attach my .config file shortly. With this .config file I get a little further, I actually get the "Uncompressing Linux... Ok, booting the kernel." Now the system stops here. I'm messing with putting debug into the various boot time assembler files to see if I can detect where things are wedging up. I'm also going to sleuth out where the injectiong point into the kernel is and put some type of print debug in there as well... Any assistance from someone who knows the kernel would be greatly appreciated. :)
Created attachment 103014 [details] my .config file for "stripped down kernel"
I removed the 'quiet' flag as you suggested and here is what I get now before the system stops booting (perhaps I stripped it down too much): BIOS-provided physical RAM map: (The bios ram map, please let me know if you need it) 0MB HIGHMEM available. 512MB LOWMEM available. zapping low mappings. DMI 2.1 present. And it stops there now. The good thing is that it DOES uncompress the kernel and appears to begin booting. This is a large step forward as far as I am concerned.
Just on a hunch, I went ahead and tried pci=noacpi acpi=off and it still stopped at the DMI 2.1 present point. I will continue again tomorrow with adding in debug output trying to pin point where I'm blocking up. (Unless, of course, there is a simpler way to debug this.)
I'm still working on different things to see if I can pinpoint the problem. Any pointers would be great. It seems to be somewhere in the arch/i386/boot/ code. Being only somewhat decent at assembler this is going to be more difficult for me than if it was in the C code. Thanks,
Okay, after spending a bit of time in a crash course of assembler I spent an even greater deal of time futzing around with the files in arch/i386/boot. My lockups are being caused by the store_edid function in the video.S file. (Note, these tests were done using 2.6.0; however the changes to the arch/i386/boot directory since that kernel revision have been relatively minor -- and I got the same lockups with 2.6.8.1.) I can counteract this problem by disabling CONFIG_VIDEO_SELECT in the .config file. By doing this the store_edid function is not used and therefore allows my kernel to boot properly. My next step is to download the SRPM for 2.6.8.1 and recompile it with the stock config file for the 686 non-smp with CONFIG_VIDEO_SELECT deselected. I will report back my (hopefully!) success at that time. In case it matters, or if anyone cares, this system has an old Diamond Viper V770D Ultra Nvidia (OEM from Dell). I haven't had any problems with it under the 2.4 kernel running X and all -- I did note that the 2.4 kernel did NOT have this edid function either though. Let me know if I can be of any assistance debugging this futher in hopes of a potential fix. I'm extremely computer literate and a strong programmer. Thanks.
Color me intrigued... I found this referencing my problem: http://lkml.org/lkml/2003/5/20/110 Reading that it sounds like it would be trivial to use the installation check function first, perhaps that would prevent the problem. I am confused at why you memset the memory range to 0x13131313 first, and then fill it with the edid information. If CONFIG_VIDEO_SELECT is disabled, that range of memory isn't even initialized at all. I may, just for giggles attempt to put in the test call and have it jump over the offending call if it fails. Should the memory range be initialized to 0x13131313 or not in that case? Thanks,
Created attachment 111033 [details] check edid function presence. Patch to do the check will probably be this. Can you do a build with this, and let me know how that works out for you? I'm going to try it on a few boxes here to be sure nothing regresses before I commit this to the Fedora tree. If it works out ok, I'll push it upstream.
Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you.