Bug 131782 - filesystem errors after upgrade
Summary: filesystem errors after upgrade
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-09-04 14:22 UTC by Sammy
Modified: 2007-11-30 22:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-08 18:32:27 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Additional info concerning the odd /boot sizes (1.58 KB, text/plain)
2004-09-05 20:09 UTC, Mike
no flags Details
Strace per request from FC1 against FC3T1 shared boot partition (3.75 KB, text/plain)
2004-09-05 21:40 UTC, Mike
no flags Details
Second strace of "df -h" (6.13 KB, text/plain)
2004-09-05 21:53 UTC, Mike
no flags Details
dump2efs /dev/hda1 (4.76 KB, text/plain)
2004-09-05 23:26 UTC, Mike
no flags Details
A small test showing the .538 kernel has a problem. (1.70 KB, text/plain)
2004-09-07 03:56 UTC, Mike
no flags Details

Description Sammy 2004-09-04 14:22:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.2; Linux; X11; en_US, en) (KHTML, like Gecko)

Description of problem:
After upgrading the the new kernel and udev-030-18 and the dev and
MAKEDEV 3.11, and mkinitrd-4.1.9 and reboot I get some weird things happening.
When I do "df" command the size of the filesystem shows up to be
some huge negative number. Both partitions / and /boot show up
as being %100 and sometimes %101 percent occupied. System
still functions fine. If I run fsck on the filesystems it fixed a lot of
inodes and after remounting the sizes are kind of correct...I say
kind of because the percent occupation seems to be not exactly
what it was but close. I kept the same kernel but went back to old
udev, dev, etc. and things seem to be stable for a day.

I have MPT fusion U320 scsi disk. I know the new 2.6.9-rc1-bk7
also make a significant jump in the fusion driver.

Version-Release number of selected component (if applicable):
kernel-2.6-1.541

How reproducible:
Always

Steps to Reproduce:
1.update to latest rawhide
2.reboot
3.do a df
    

Additional info:

Comment 1 Mike 2004-09-05 20:09:41 UTC
Created attachment 103493 [details]
Additional info concerning the odd /boot sizes

Comment 2 Leonard den Ottolander 2004-09-05 20:51:36 UTC
Is this a kernel or coreutils issue?


Comment 3 Tim Waugh 2004-09-05 21:11:09 UTC
Please supply 'strace' output so we can be sure which package is at fault.

Comment 4 Mike 2004-09-05 21:32:15 UTC
RE: coment #3; I'd be happy to provide one with a little coaching. 
I've  run strace on various programs when run from the command line. 
What would you suggest as a test; i.e. "strace what?"

Comment 5 Tim Waugh 2004-09-05 21:33:56 UTC
Just run 'strace df -h 2>log', for example.

Comment 6 Mike 2004-09-05 21:40:57 UTC
Created attachment 103494 [details]
Strace per request from FC1 against FC3T1 shared boot partition

Comment 7 Mike 2004-09-05 21:53:45 UTC
Created attachment 103495 [details]
Second strace of "df -h"

I didn't see anything in the previous strace which concerned the /boot
partition (hda1) so am including this variation.

Comment 8 Leonard den Ottolander 2004-09-05 23:13:54 UTC
statfs("/boot", {f_type="EXT2_SUPER_MAGIC", f_bsize=1024,
f_blocks=101086, f_bfree=118793, f_bavail=113574, f_files=26104,
f_ffree=26030, f_fsid={0, 0}, f_namelen=255, f_frsize=0}) = 0

f_bfree (and f_bavail) > f_blocks

Ouch!

That seems to conflict with what I expect should be:
f_blocks >= f_bfree >= f_bavail

Now is indeed the file system corrupted, or is the kernel reporting
wrong figures?

Could you please attach the output from "dumpe2fs /dev/hda1"?


Comment 9 Mike 2004-09-05 23:26:52 UTC
Created attachment 103496 [details]
dump2efs /dev/hda1

Comment 10 Sammy 2004-09-06 20:18:29 UTC
Arjan, 
 
I am still having these errors with 1.540  !! It is very weird because I 
can also boot 1.538 and that does not seem consistent either. In 
particular "df" seems to give radically different answers. I do an 
fsck on my root filesystem, whcih shows 27% occupation, which I know 
is close to the true amount. Then I reboot and login to find "df" showing 
19%, 20% etc. I even played with coreutils, downgraded it to 5.2.1-17 
and it changes the used amount from 18% to 21% but a second later 
it falls to 20%. Than I go back t0 5.2.1-23 and do a df to see %17. 
 
I know you removed the ext3 resize patch from 1.541 in 5.140. Is there 
another culprit? 

Comment 11 Leonard den Ottolander 2004-09-06 20:24:34 UTC
If the filesystem is corrupted reversing the kernel will not fix that.
All it does is avoiding that more filesystems get corrupted.


Comment 12 Mike 2004-09-06 20:59:59 UTC
In a conversation on the Fedora-Test-List today Stephen Tweedie
indicates this problem (odd "df" results) may have been introduced in
the .538 kernel which is in line with my logs.

http://marc.theaimsgroup.com/?l=fedora-test-list&m=109449522208471&w=2


Comment 13 Sammy 2004-09-07 00:31:57 UTC
OK...fsck does fix it temporarily. 
 
The odd thing is that I have the same setup on my DELL Inspiron notebook and 
I am not having any problems there. My office system is a SCSI system. Do 
you guys have IDE disks? 

Comment 14 Mike 2004-09-07 01:56:23 UTC
Mine is an IDE on a Sony VAIO Laptop:

Probing IDE interface ide0...
hda: TOSHIBA MK6026GAX, ATA DISK drive

One of the folks in the conversation I mentioned earlier this evening
noted that he had a problem with his /boot partition after editing his
/boot/grub/grub.conf file using "vi".   I don't recollect a specific
instance like that in my own case but it's possible that I did
something while running under the .538 kernel to change a file on the
/boot partition.  I don't know the nature of the bug that may have
been introduced in the .538 kernel so it would be hard to tell if SCSI
vs IDE is a factor.

Comment 15 Mike 2004-09-07 03:56:40 UTC
Created attachment 103526 [details]
A small test showing the .538 kernel has a problem.

I posted this to the Fedora-Test-List.

Comment 16 Sammy 2004-09-07 13:10:42 UTC
O.K. I rebooted using the FC2 boot CD, ran fsck on all filesystems (many 
inodes fixed) and then rebooted into 1.540. "df" seems to be reporting 
correctly now. I removed 1.538 kernel rpm. I am keeping my fingers 
crossed but 1.538 may be the culprit. 

Comment 17 Leonard den Ottolander 2004-09-07 13:50:25 UTC
Using .540 will likely cause corruption again. The issue was
introduced in .538, not removed in .540. I think Arjan wrongly assumed
the issue was introduced in .541, that's why he reverted the kernel in
RawHide to .540.

I believe you need to use .533 to be safe.


Comment 18 Sammy 2004-09-07 14:20:05 UTC
Perhaps....but so far the system is fine with .540 and my laptop that has 
been on 1.541 and 1.540 never had this problem. It could have been 
something in the -bkX series that got fixed in the meantime. 

Comment 19 Sammy 2004-09-08 13:52:55 UTC
I installed 1.549 on both machines and still everything is fine. I always 
got /boot filesystem messed up when installing a new kernel for the bad 
kernels. Does this mean this issue is resolved? 

Comment 20 Leonard den Ottolander 2004-09-08 18:32:27 UTC
Yes. This appears to be an issue with 538 *only*.



Note You need to log in before you can comment on or make changes to this bug.