Description of problem: I have a 64 bit F8 Test3 system that works fine with the 2.6.23-6.fc8 kernel, it won't boot with the 2.6.23.1-23.fc8 kernel. During boot it fails the file system check. Running fsck on all of the partitions reveals no problems. If the 2.6.23-6.fc8 kernel is selected from the grub menu the system boots fine. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.Select 2.6.23.1-23.fc8 kernel during boot 2.Fails the file system check 3. Actual results: Expected results: Additional info:
"Fails the file system check" is not very useful information. What does it print? If you can, take a picture of the screen with a digital camera and attach that.
Created attachment 233671 [details] Boot screen This is a little fuzzy but it's readable. The error message looks similar to an fsck failure except that it doesn't specify a partition. Running fsck manually on all of the partitions finds no errors.
I've uploaded a snapshot of the screen on the boot failure. Here is the list of the partitions on this system. Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda5 7749536 4412372 2937148 61% / /dev/sda7 7874528 148800 7325712 2% /os_y /dev/sda8 441346556 2779924 416147528 1% /user /dev/sda1 7874528 3018168 4456344 41% /gutsy /dev/sda6 7874528 148800 7325712 2% /os_x tmpfs 1497208 0 1497208 0% /dev/shm Here is some hardware info, 00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2) 00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2) 00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2) 00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2) 00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2) 00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2) 00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2) 00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2) 00:02.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1) 00:03.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1) 00:04.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1) 00:05.0 VGA compatible controller: nVidia Corporation C51 [Quadro NVS 210S/GeForce 6150LE] (rev a2) 00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2) 00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a2) 00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a2) 00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a2) 00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a2) 00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a2) 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1) 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1) 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1) 00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2) 00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2) 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 15 model name : AMD Athlon(tm) 64 Processor 3800+ stepping : 0 cpu MHz : 1000.000 cache size : 512 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow rep_good bogomips : 2010.49 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
(In reply to comment #2) > Created an attachment (id=233671) [edit] > Boot screen > > This is a little fuzzy but it's readable. The error message looks similar to an > fsck failure except that it doesn't specify a partition. Running fsck manually > on all of the partitions finds no errors. > This message is usually from a bad entry in /etc/fstab.
The /etc/fstab was generated by the f* installer. It works fine with the older kernel, the problem is with this kernel. LABEL=/ / ext3 defaults 1 1 LABEL=/os_y /os_y ext3 defaults 1 2 LABEL=/user /user ext3 defaults 1 2 LABEL=/gutsy /gutsy ext3 defaults 1 2 LABEL=/os_x /os_x ext3 defaults 1 2 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 /dev/sda3 swap swap defaults 0 0
having the exact same problem. It restarted booting properly with kernel 2.6.23.1-30.fc8 but stopped again tonight with kernel 2.6.23.1-31.fc8 (tried 5 times with -31 with no succes and switched back to version -30 and works like a charm) I first tought there was a prblem with my VG's but no... While investigating I found out that if I would boot in single user mode (by adding single at kernel boot entry) I would get to a root prompt without any problems... then I can just type init 5 and the system will work properly... although I totally cannot boot normally !?!? The difference is 100% reproducible while comparing tests between kernels 2.6.23.1-30.fc8 (good) vs 2.6.23.1-31.fc8 (bad) Sadly I have'nt found the source package of kernel version 30 so I cannot tell what exactly are the difference between the two releases. [root@gustav ~]# lspci 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80) 01:0a.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13) 05:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600 GT] (rev a2) - vin
kernel 2.6.23.1-31.fc8 is just as broken as 2.6.23.1-30.fc8 I did an update and then tried the new kernel, it fails identically to the -30. The -06 is still working.
Adding "ignore_loglevel" to the kernel options should make any possible kernel warnings print during startup. Adding this line: set -x to /etc/rc.sysinit (right after "set -m") will make rc.sysinit print each command before running it. Another possibility: edit /etc/fstab and one by one change the last two numbers on each line to "0 0", rebooting after each change. But make sure the filesystems are shut down cleanly before each boot... Also, adding "fastboot" to the kernel options will skip all filesystem checking during bootup.
Created attachment 239731 [details] Error screen snapshot I put a set -m into /etc/rc.sysinit. Here is another snapshot of the error screen.
The new kernel is bootable again for me... (2.6.23.1-35.fc8) Give it a try? Also, why don't you pass the argument vga=0x305 at the kernel boot line in grub to have a better resolution hence more info on screen. - vin
.35 doesn't work either. I've also built standard 2.6.23 and 2.6.23.1 and 2.6.23.1 without ext4 kernels, those don't work either. I have one partition that was formated by Gutsy instead of F8, I set that to not autoload, that didn't help either. I'm now in the process of building a 2.6.23.1 kernel without the extended EXT3 attributes. I'll post the results for that when I have them.
I've identified the kernel feature that's responsible for the problem, it's POSIX Access Control Lists. A 2.6.23.1 kernel built with these switches works, # File systems # CONFIG_EXT2_FS=m # CONFIG_EXT2_FS_XATTR is not set # CONFIG_EXT2_FS_XIP is not set CONFIG_EXT3_FS=m # CONFIG_EXT3_FS_XATTR is not set # CONFIG_EXT4DEV_FS is not set These switches don't work # # File systems # CONFIG_EXT2_FS=m CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y # CONFIG_EXT2_FS_SECURITY is not set # CONFIG_EXT2_FS_XIP is not set CONFIG_EXT3_FS=m CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y
I wonder.. may theses be somehow related? https://bugzilla.redhat.com/show_bug.cgi?id=210111
A little bit more background on this box. This is a test system I've just put together so it has nothing on it but Fedora 8 and Gutsy. I partitioned the disk with F8 Test3, I have SELinux disabled. Is it possible that F8 screwed something up in the file system when I turned off SELinux during the first boot?
It's a reporting problem, the error message fails to give any information about which partition has the problem or what the problem is. E2fsck reported that all of the partitions were clean when I ran it without switches, however when I ran it with the -f switch it found problems with some of the attribute counts. After I fixed the problems I was able to boot. So the main problem isn't with 2.6.23, it found the file system errors, it's with 2.6.22.x and earlier which didn't find a problem. The problem with 2.6.23 is that it needs better error reporting. At the very least it should specify which partition has the file system problem, it would be better if it also specfied what the problems are.
(In reply to comment #9) > Created an attachment (id=239731) [edit] > Error screen snapshot > > I put a set -m into /etc/rc.sysinit. > The output has scrolled off the screen. Booting with "vga=792" should give high-resolution mode with more visible lines.
I've fixed the file system errors on this system so there isn't anymore input that I can provide on this problem. As I said in my previous post, I think it would be a good idea to improve the error reporting. When the boot fails because of a file system check error it should report the bad partition. It would also be nice if there was an automatic recovery option in addition to dropping you into the CLI and having you run e2fsck manually. The auto recovery choice would run the fscks with a switch that would do all the fixes without asking. One more thing. It would be nice if the kernel had a switch that allowed it to log to a USB FLASH key. Right now your only choice except for the console is to use a serial port connection to a remote console, that's not very convenient. Pluging in a USB FLASH key would be much easier. It would make the debugging of problems like this simpler.
The problem went back yesterday using latest kernel (2.6.23.1-37.fc8). Again, it hanged right before mounting the file systems stating that root password was needed... I rebooted using the previous working kernel (2.6.23.1-35.fc8) and it discovered that it needed to do an automatic fsck on one of my filesystem.... the system did it and booted just fine. I am wondering if the problem wound not be residing right there, when automatic disk checking is needed... Did the hang was due to do the disk checking or simply because it tried to set the fsck flag for the next reboot? Or would booting between a Fedora 7 and a rawhide affects the fs check flags? Anyhow, I tried rebooting using latest kernel and it works now like a charm (although no disk checking is needed this time...)
Just upgraded to latest kernel (2.6.23.1-41.fc8) and got the same problem except that this time there was no file system to check? After a second reboot (by pressing CTRL-D) it rebooted properly this time still using the new kernel... I don't get it? This might be pointless but could it be related to this: http://lkml.org/lkml/2007/10/29/280 - vin
Have you run e2fsck with a -f switch on all of your partitions? I really had a file system problem even though e2fsck said my partitions were clean. When I forced a check it found the problems.
WHat happens if you disable rhgb?
I see this on two P4-1.7 machines so it's not x64 specific. When I disable rhgb and remove quiet, it boots, and keeps booting afterwards - sometimes. I have to zero the disk and reinstall to repeat it. deleting the partition table and booting/installing the f8rc3 media won't do it. 2.6.23.1-42.fc8
I had found that booting in single user mode works... (since by default I always boot without rhgb and quiet but always add vga=0x305) At the moment it does'nt do it anymore... it seems "hard" to reproduce systematically. I still strongly presume that the problem occur when a flag of chkdsk must be added to a specific FS du to many mounts maybie in conjunction of another partition that needs checking...
This was all due to a bug in rhgb. Everything should be fine after updating that.
anyone still seeing these problems after running all the latest updates?