Bug 64787

Summary: fsck called from rc.sysinit fails on larger partitions
Product: [Retired] Red Hat Linux Reporter: Joe Christy <joe.christy>
Component: e2fsprogsAssignee: Florian La Roche <laroche>
Status: CLOSED WORKSFORME QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-08-12 12:08:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joe Christy 2002-05-10 22:03:48 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0+) Gecko/20020507

Description of problem:
When fsck -f runs from /etc/rc.sysinit, because of either mount-counts reaching
max-mount-counts on an ext3 filesystem or the existence  of a /forcefsck file, I
get a hang fsck-ing a 28GB partition, but not on smaller partitions.

Version-Release number of selected component (if applicable):
initscripts-6.67-1 / e2fsprogs-1.27-3


How reproducible:
Always

Steps to Reproduce:
1. touch /forcefsck or tune2fs -c N -C N /dev/big-partition
2. reboot
3. wait ...
	

Actual Results:  fsck hangs checking /dev/big-partition and system wedges.

Expected Results:  fsck successfully completes checking /dev/big-partition and
system boots.

Additional info:

I am running 7.3-Valhalla and kernel 2.4.18-4smp (i.e. the one with the ext3 fs
corrupting panic problem fixed). In my case, /dev/big-parition is a 28GB
partiton on a (IBM DRHS36V) SCSI disc.

Other partitions, up to 7.9 GB (on another scsi disc) and 4.9GB (on the same
disc) do fsck fine in this situation.

The partition in question will fsck when the system boots from the 7.2 sysadmin
survival disc (which also enabled me to remove the /forcefsck file, thank god).
It also fsck's fine when I run fsck -f manually on it.

Comment 1 Joe Christy 2002-07-18 17:51:21 UTC
Yesterday I witnessed this twice more (after a bug in my development code
crashed two machines hard so they had to be reset). I now believe that the
problem is a timeout somewhere in the communication between the between the fsck
engine and the boot process/fsck UI.

Here are some more details (both machines are SCSI SMP, up to date w/ RHN, all
file systems are ext3):

Machine no.1: During reboot post-reset, fs journal replay apparently wedges for
15G partitions - 8 hour hang, keyboard dead, etc. Next, I rebooted from the
install CD in rescue mode, and removed the partition in question from the fstab.
Another reboot in to single user, and I run e2fsck on the partition. It reports
that the partition is clean, but being skeptical, I run e2fsck -f. This hangs
for about an hour, leaving keyboard dead in all virtual terminals, top hung as
well. Re-booting yet again, I run tune2fs -l on the partition and discover that
the label has clean bit set *and* the last check is timestamped during the hour
of hanging.

Machine no. 2: During reboot post-reset, fs journal replay apparently wedges for
23G partitions - 4 hour hang, keyboard dead, top hung in second virtual
terminal. Rebooting in rescue mode from install CD and running tune2fs on the
partition reports partition clean and last mount and write times are about 90
minutes into the 4 hour hang.

Comment 2 Florian La Roche 2002-08-02 08:35:12 UTC
It can sometimes be a kernel VM problem if you fsck large partitions.
I don't see fsck problems on machines I have available, so I need more
information on what is going wrong here. I don't think this is related on
how the initscripts invoke fsck.

Thanks,

Florian La Roche