From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0+) Gecko/20020507 Description of problem: When fsck -f runs from /etc/rc.sysinit, because of either mount-counts reaching max-mount-counts on an ext3 filesystem or the existence of a /forcefsck file, I get a hang fsck-ing a 28GB partition, but not on smaller partitions. Version-Release number of selected component (if applicable): initscripts-6.67-1 / e2fsprogs-1.27-3 How reproducible: Always Steps to Reproduce: 1. touch /forcefsck or tune2fs -c N -C N /dev/big-partition 2. reboot 3. wait ... Actual Results: fsck hangs checking /dev/big-partition and system wedges. Expected Results: fsck successfully completes checking /dev/big-partition and system boots. Additional info: I am running 7.3-Valhalla and kernel 2.4.18-4smp (i.e. the one with the ext3 fs corrupting panic problem fixed). In my case, /dev/big-parition is a 28GB partiton on a (IBM DRHS36V) SCSI disc. Other partitions, up to 7.9 GB (on another scsi disc) and 4.9GB (on the same disc) do fsck fine in this situation. The partition in question will fsck when the system boots from the 7.2 sysadmin survival disc (which also enabled me to remove the /forcefsck file, thank god). It also fsck's fine when I run fsck -f manually on it.
Yesterday I witnessed this twice more (after a bug in my development code crashed two machines hard so they had to be reset). I now believe that the problem is a timeout somewhere in the communication between the between the fsck engine and the boot process/fsck UI. Here are some more details (both machines are SCSI SMP, up to date w/ RHN, all file systems are ext3): Machine no.1: During reboot post-reset, fs journal replay apparently wedges for 15G partitions - 8 hour hang, keyboard dead, etc. Next, I rebooted from the install CD in rescue mode, and removed the partition in question from the fstab. Another reboot in to single user, and I run e2fsck on the partition. It reports that the partition is clean, but being skeptical, I run e2fsck -f. This hangs for about an hour, leaving keyboard dead in all virtual terminals, top hung as well. Re-booting yet again, I run tune2fs -l on the partition and discover that the label has clean bit set *and* the last check is timestamped during the hour of hanging. Machine no. 2: During reboot post-reset, fs journal replay apparently wedges for 23G partitions - 4 hour hang, keyboard dead, top hung in second virtual terminal. Rebooting in rescue mode from install CD and running tune2fs on the partition reports partition clean and last mount and write times are about 90 minutes into the 4 hour hang.
It can sometimes be a kernel VM problem if you fsck large partitions. I don't see fsck problems on machines I have available, so I need more information on what is going wrong here. I don't think this is related on how the initscripts invoke fsck. Thanks, Florian La Roche