54371 – mke2fs (e2fsprogs) appears to have an end condition bug

Bug 54371 - mke2fs (e2fsprogs) appears to have an end condition bug

Summary: mke2fs (e2fsprogs) appears to have an end condition bug

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Aaron Brown
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-10-04 23:13 UTC by Andrew Smith
Modified:	2007-04-18 16:37 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2002-02-19 22:48:56 UTC
Embargoed:

Attachments	(Terms of Use)

Description Andrew Smith 2001-10-04 23:13:54 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.2-2 i686; Nav)

Description of problem:
I have installed 3 copies of RedHat 7.1 in the last few months and each
time I have had a badblock shown when I run badblocks for 1 or 3 blocks
just before the last block on the "/" partition (but no others) - it must
be an end condition bug since it has happened on 3 different computers - 2
of which are EXACTLY the same except the CPU

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install RH 7.1 (/boot=50Mb, /=most of rest of a large drive) each case /
is >= 4G
2. Login after install and run "badblocks -v /dev/hda?" for the "/"
partition
	

Actual Results:  3 machines results
(output of "df" followed by the output of each "badblocks" I run)

1)
**** Fri Oct  5 05:22:01 EST 2001
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda5             18034128  12774808   4343216  75% /
/dev/hda2                54447      3489     48147   7% /boot
/dev/hda1              1027856      6976   1020880   1% /dos
**** Fri Oct  5 05:35:06 EST 2001 badblocks /=/dev/hda5
Checking for bad blocks in read-only mode
>From block 0 to 18322101
18322100
Pass completed, 1 bad blocks found.
**** Fri Oct  5 05:55:38 EST 2001 badblocks /boot=/dev/hda2
Checking for bad blocks in read-only mode
>From block 0 to 56227
Pass completed, 0 bad blocks found.

2)
**** Fri Oct  5 05:12:00 EST 2001
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda5             18034128  13764700   3353324  81% /
/dev/hda2                54447      3489     48147   7% /boot
/dev/hda1              1027856      7088   1020768   1% /dos
**** Fri Oct  5 05:21:38 EST 2001 badblocks /=/dev/hda5
Checking for bad blocks in read-only mode
>From block 0 to 18322101
18322100
Pass completed, 1 bad blocks found.
**** Fri Oct  5 05:39:31 EST 2001 badblocks /boot=/dev/hda2
Checking for bad blocks in read-only mode
>From block 0 to 56227
Pass completed, 0 bad blocks found.

3)
**** Fri Oct  5 05:32:00 EST 2001
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda5              3897800   2583584   1116220  70% /
/dev/hda2                54447      3849     47787   8% /boot
/dev/hda1              2096160   2002656     93504  96% /dos
/dev/hdb1             20063360  17683392   2379968  89% /video
**** Fri Oct  5 05:36:30 EST 2001 badblocks /=/dev/hda5
Checking for bad blocks in read-only mode
>From block 0 to 3959991
3959988
3959989
3959990
Pass completed, 3 bad blocks found.
**** Fri Oct  5 05:41:08 EST 2001 badblocks /boot=/dev/hda2
Checking for bad blocks in read-only mode
>From block 0 to 56227
Pass completed, 0 bad blocks found.

Expected Results:  No bad blocks

Additional info:

As stated above - it would appear to be an end condition error in mke2fs
rather than a hard drive problem since it has happened on 3 completely
separate computers with no other errors showing up
I run badblocks each day on all 3 computers and no other errors have ever
shown up in the last 3 months on the first 2 computers
The third computer I installed yesterday and the first run of badblocks
gave the error shown (3 bad blocks rather than only 1 - but in the exact
same place)
Coincedence is too high to be hard drive errors

Comment 1 Andrew Smith 2001-10-14 06:01:45 UTC

OK, just to put more belief in the end condition bug idea:
I have just installed ANOTHER RedHat 7.1 machine and AGAIN
it has a few bad blocks at the end of the / partition:

**** Sun Oct 14 05:42:00 EST 2001
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hdc5              3637796   2441168   1011836  71% /
/dev/hdc1                49838      3832     43433   9% /boot
/dev/hda1              4208784    261284   3947500   7% /dos
**** Sun Oct 14 05:53:14 EST 2001 badblocks /=/dev/hdc5
Checking for bad blocks in read-only mode
>From block 0 to 3695863
3695860
3695861
3695862
Pass completed, 3 bad blocks found.
**** Sun Oct 14 06:02:45 EST 2001 badblocks /boot=/dev/hdc1
Checking for bad blocks in read-only mode
>From block 0 to 51471
Pass completed, 0 bad blocks found.

Comment 2 Need Real Name 2001-11-05 07:11:15 UTC

Compaq Deskpro. Fuji 17G drive. If you need more info let me know
sarsenault.

[root@... /root]# df -h                
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda6             5.6G  1.1G  4.2G  20% /
/dev/hda7             1.8G   33M  1.7G   2% /backup
/dev/hda1              53M  9.1M   41M  18% /boot
/dev/hda5             7.7G  751M  6.5G  11% /home
none                  109M     0  108M   0% /dev/shm
[root@... /root]# badblocks -v /dev/hda1
Checking for bad blocks in read-only mode
From block 0 to 56196
Pass completed, 0 bad blocks found.
[root@... /root]# badblocks -v /dev/hda5
Checking for bad blocks in read-only mode
From block 0 to 8193118
8193116
8193117
Pass completed, 2 bad blocks found.
[root@... /root]# badblocks -v /dev/hda6
Checking for bad blocks in read-only mode
From block 0 to 5919921
5919920
Pass completed, 1 bad blocks found.
[root@... /root]# badblocks -v /dev/hda7
Checking for bad blocks in read-only mode
From block 0 to 1951866
1951864
1951865
Pass completed, 2 bad blocks found.

This is not reassuring but I am not believing the results due to this post. I 
sure hope someone comes up with a answer.

Comment 3 Florian La Roche 2002-02-17 12:15:26 UTC

This is a kernel problem. I think a partial fix is in our current errata
kernel, a real clean fix can only go into the development kernel 2.5.x.

cu,

Florian La Roche

Comment 4 Arjan van de Ven 2002-02-18 20:45:03 UTC

Can you check if your partition is an odd number of sectors in size ?

Comment 5 Stephen Tweedie 2002-02-19 16:12:49 UTC

Which kernel are you using, exactly?  This sounds to me as if there's a block
size problem manifesting on filesystems using a 1k blocksize.  If the buffered
IO during the badblocks test happens to use a 4k blocksize by default, you'd get
exactly these symptoms.

Comment 6 Andrew Smith 2002-02-19 22:48:52 UTC

OK - the partitions were created with the standard 7.1 install so I'd guess my 
current kernel is not going to explain anything, but anyway it is: 2.4.9-12 
(from the redhat updates)
Having looked at this again, I see two actual problems:
1) The number of blocks on each partition (reported by df) is quit a bit 
smaller than the number used by badblocks to check the whole partition - is 
this expected behaviour? Should df report the true size of the partition or 
does ext2 not use the whole parition and thus waste about 1% or 2% of it? Or 
is this extra space used for something else?
2) Badblocks is checking past the size specified by df

Output of /proc/partitions (for machine 1 or 2 at top - they are the same)
major minor  #blocks  name     rio rmerge rsect ruse wio wmerge wsect wuse 
running use aveq

   3     0   19938240 hda 4414849 15753092 160234604 7482097 324656 2487693 
22511544 5061971 -20 15050850 9646918
   3     1    1028128 hda1 294 0 304 210 0 0 0 0 0 210 210
   3     2      56227 hda2 10907 173918 369650 10170 5 8 32 510 0 9890 10680
   3     3          1 hda3 0 0 0 0 0 0 0 0 0 0 0
   3     5   18322101 hda5 4403538 15578289 159856690 7469557 324595 2486748 
22503568 5057831 0 14774260 12797228
   3     6     530113 hda6 110 885 7960 2160 56 937 7944 3630 0 3250 5790

Comment 7 Stephen Tweedie 2002-02-20 15:42:51 UTC

"df" counts usable blocks, but there are reserved blocks over and above that
count for the inode tables.  So you would expect "df" to show smaller than the
partition size.

"tune2fs -l" will list the superblock on a device, and will tell you the true
total block size that the filesystem has been created with.

/dev/hda5 is exactly 18322101 blocks long.  That's quite large, so hda5 is going
to have a blocksize of 4k.

You have hda5 mounted, so the kernel has already been forced into using that 4k
blocksize for all IO access to that partition.

You have not given "badblocks" a block size argument, so it has assumed the
smallest possible, 1k, to give the greatest possible coverage on the device.

Badblocks has then tried to access the 1k blocks beyond the last complete 4k
block in the partition (because the partition has an odd size), and because the
kernel is already using a 4k blocksize, it tries to pad the 1k read out to a
complete 4k block and fails because that extends beyond the end of the device.

Solution:

use the "-b 4096" option to badblocks to tell it what the blocksize really is.

This is not strictly a bug, more a restriction on the kernel's ability to deal
with multiple blocksizes at once on a device.  Please reopen this bug report if
the "-b 4096" doesn't fix things for you.

Note You need to log in before you can comment on or make changes to this bug.