64787 – fsck called from rc.sysinit fails on larger partitions

Bug 64787 - fsck called from rc.sysinit fails on larger partitions

Summary: fsck called from rc.sysinit fails on larger partitions

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	e2fsprogs
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Florian La Roche
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-05-10 22:03 UTC by Joe Christy
Modified:	2007-04-18 16:42 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-08-12 12:08:41 UTC
Embargoed:

Attachments	(Terms of Use)

Description Joe Christy 2002-05-10 22:03:48 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0+) Gecko/20020507

Description of problem:
When fsck -f runs from /etc/rc.sysinit, because of either mount-counts reaching
max-mount-counts on an ext3 filesystem or the existence  of a /forcefsck file, I
get a hang fsck-ing a 28GB partition, but not on smaller partitions.

Version-Release number of selected component (if applicable):
initscripts-6.67-1 / e2fsprogs-1.27-3


How reproducible:
Always

Steps to Reproduce:
1. touch /forcefsck or tune2fs -c N -C N /dev/big-partition
2. reboot
3. wait ...
	

Actual Results:  fsck hangs checking /dev/big-partition and system wedges.

Expected Results:  fsck successfully completes checking /dev/big-partition and
system boots.

Additional info:

I am running 7.3-Valhalla and kernel 2.4.18-4smp (i.e. the one with the ext3 fs
corrupting panic problem fixed). In my case, /dev/big-parition is a 28GB
partiton on a (IBM DRHS36V) SCSI disc.

Other partitions, up to 7.9 GB (on another scsi disc) and 4.9GB (on the same
disc) do fsck fine in this situation.

The partition in question will fsck when the system boots from the 7.2 sysadmin
survival disc (which also enabled me to remove the /forcefsck file, thank god).
It also fsck's fine when I run fsck -f manually on it.

Comment 1 Joe Christy 2002-07-18 17:51:21 UTC

Yesterday I witnessed this twice more (after a bug in my development code
crashed two machines hard so they had to be reset). I now believe that the
problem is a timeout somewhere in the communication between the between the fsck
engine and the boot process/fsck UI.

Here are some more details (both machines are SCSI SMP, up to date w/ RHN, all
file systems are ext3):

Machine no.1: During reboot post-reset, fs journal replay apparently wedges for
15G partitions - 8 hour hang, keyboard dead, etc. Next, I rebooted from the
install CD in rescue mode, and removed the partition in question from the fstab.
Another reboot in to single user, and I run e2fsck on the partition. It reports
that the partition is clean, but being skeptical, I run e2fsck -f. This hangs
for about an hour, leaving keyboard dead in all virtual terminals, top hung as
well. Re-booting yet again, I run tune2fs -l on the partition and discover that
the label has clean bit set *and* the last check is timestamped during the hour
of hanging.

Machine no. 2: During reboot post-reset, fs journal replay apparently wedges for
23G partitions - 4 hour hang, keyboard dead, top hung in second virtual
terminal. Rebooting in rescue mode from install CD and running tune2fs on the
partition reports partition clean and last mount and write times are about 90
minutes into the 4 hour hang.

Comment 2 Florian La Roche 2002-08-02 08:35:12 UTC

It can sometimes be a kernel VM problem if you fsck large partitions.
I don't see fsck problems on machines I have available, so I need more
information on what is going wrong here. I don't think this is related on
how the initscripts invoke fsck.

Thanks,

Florian La Roche

Note You need to log in before you can comment on or make changes to this bug.