Red Hat Bugzilla – Bug 64787
fsck called from rc.sysinit fails on larger partitions
Last modified: 2007-04-18 12:42:32 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0+) Gecko/20020507
Description of problem:
When fsck -f runs from /etc/rc.sysinit, because of either mount-counts reaching
max-mount-counts on an ext3 filesystem or the existence of a /forcefsck file, I
get a hang fsck-ing a 28GB partition, but not on smaller partitions.
Version-Release number of selected component (if applicable):
initscripts-6.67-1 / e2fsprogs-1.27-3
Steps to Reproduce:
1. touch /forcefsck or tune2fs -c N -C N /dev/big-partition
3. wait ...
Actual Results: fsck hangs checking /dev/big-partition and system wedges.
Expected Results: fsck successfully completes checking /dev/big-partition and
I am running 7.3-Valhalla and kernel 2.4.18-4smp (i.e. the one with the ext3 fs
corrupting panic problem fixed). In my case, /dev/big-parition is a 28GB
partiton on a (IBM DRHS36V) SCSI disc.
Other partitions, up to 7.9 GB (on another scsi disc) and 4.9GB (on the same
disc) do fsck fine in this situation.
The partition in question will fsck when the system boots from the 7.2 sysadmin
survival disc (which also enabled me to remove the /forcefsck file, thank god).
It also fsck's fine when I run fsck -f manually on it.
Yesterday I witnessed this twice more (after a bug in my development code
crashed two machines hard so they had to be reset). I now believe that the
problem is a timeout somewhere in the communication between the between the fsck
engine and the boot process/fsck UI.
Here are some more details (both machines are SCSI SMP, up to date w/ RHN, all
file systems are ext3):
Machine no.1: During reboot post-reset, fs journal replay apparently wedges for
15G partitions - 8 hour hang, keyboard dead, etc. Next, I rebooted from the
install CD in rescue mode, and removed the partition in question from the fstab.
Another reboot in to single user, and I run e2fsck on the partition. It reports
that the partition is clean, but being skeptical, I run e2fsck -f. This hangs
for about an hour, leaving keyboard dead in all virtual terminals, top hung as
well. Re-booting yet again, I run tune2fs -l on the partition and discover that
the label has clean bit set *and* the last check is timestamped during the hour
Machine no. 2: During reboot post-reset, fs journal replay apparently wedges for
23G partitions - 4 hour hang, keyboard dead, top hung in second virtual
terminal. Rebooting in rescue mode from install CD and running tune2fs on the
partition reports partition clean and last mount and write times are about 90
minutes into the 4 hour hang.
It can sometimes be a kernel VM problem if you fsck large partitions.
I don't see fsck problems on machines I have available, so I need more
information on what is going wrong here. I don't think this is related on
how the initscripts invoke fsck.
Florian La Roche