Bug 506376

Summary: ext3 file system became inconsistent
Product: [Fedora] Fedora Reporter: Aram Agajanian <agajania>
Component: dmraidAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 11CC: agk, bmr, dwysocha, esandeen, hdegoede, heinzm, itamar, kernel-maint, lvm-team, mbroz, prockai, quintela
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-03 21:02:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel messages
none
/var/log/dmesg
none
a new dmesg output with new errors none

Description Aram Agajanian 2009-06-16 23:52:10 UTC
Created attachment 348196 [details]
kernel messages

Description of problem:

I recently installed F11 on my PC.  I installed it into a newly formatted ext4 partition.  However, I kept several of the ext3 partitions from my F9 system intact.  These partitions include /home, /opt, and the F9 root partition, which I have been mounting on /mnt/oldroot.

When I rebooted, the F9 root partition (ext3) was inconsistent.  I started to enter Y to the fsck questions.  After a few errors, I decided to enter N to all the rest.  There were a lot of them, and eventually I just rebooted without getting to the end.

In order to boot into F11, I had to comment out the line for this partition in /etc/fstab.  The commented out line looks like:

#/dev/mapper/vg00-root2  /mnt/oldroot            ext3    defaults        1 2

I have also noticed some kernel messages which seem to indicate a problem.  I am attaching these messages in a file.


Version-Release number of selected component (if applicable):

kernel-2.6.29.4-167.fc11.x86_64


How reproducible:

I don't why this happened or if there is any troubleshooting that I can do.


Steps to Reproduce:
1.
2.
3.
  

Actual results:

The /dev/mapper/vg00-root2 filesystem became inconsistent.


Expected results:

The /dev/mapper/vg00-root2 filesystem should not have become inconsistent.


Additional info:

My PC is a Dell Optiplex 755 with the Q35 Express chipset.  I believe that the ICH is ICH9 DO.

There are two drives in a dmraid configuration (RAID 1).

I don't know if this is related, but when I run the find command in my home directory it aborts.

Comment 1 Chuck Ebbert 2009-06-17 23:05:17 UTC
Can you post the output of these commands:

  lvs
  pvs
  vgs

Comment 2 Aram Agajanian 2009-06-17 23:10:58 UTC
[root@frogn ~]# lvs
  LV     VG   Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  files  vg00 -wi-ao 40.00G                                      
  home   vg00 -wi-ao 40.00G                                      
  root   vg00 -wi-ao 20.00G                                      
  root2  vg00 -wi-a- 20.00G                                      
  swap   vg00 -wi-ao  6.00G                                      
  tmp    vg00 -wi-ao 10.00G                                      
  vartmp vg00 -wi-ao 20.00G                                      
  winxp  vg00 -wi-a- 30.00G                                      

[root@frogn ~]# pvs
  PV         VG   Fmt  Attr PSize   PFree 
  /dev/dm-2  vg00 lvm2 a-   232.59G 46.59G

[root@frogn ~]# vgs
  VG   #PV #LV #SN Attr   VSize   VFree 
  vg00   1   8   0 wz--n- 232.59G 46.59G

Comment 3 Chuck Ebbert 2009-06-18 15:32:33 UTC
dm-5: rw=0, want=2952931456, limit=83886080
attempt to access beyond end of device

  83886080 = 40G
2952931456 = 1408G

Can you attach the entire /var/log/dmesg file?

Comment 4 Aram Agajanian 2009-06-18 15:44:01 UTC
Created attachment 348502 [details]
/var/log/dmesg

The timestamp on this file is 2009-06-16 19:02.

Comment 5 Aram Agajanian 2009-06-18 16:51:35 UTC
When I was running F9, there was a problem with dmraid on some early kernels:

https://bugzilla.redhat.com/show_bug.cgi?id=443466

For a while, dmraid wasn't working on this computer.  I few months later, I got it to start working again.  (I don't remember exactly what I did, I think I adjusted the kernel parameters.)  It's possible that some metadata became messed up, but I never had any problems with dmraid after that on F9.

I am using dmraid on F11 on a different PC with a newer Intel chipset.  I don't have any problems there.

Comment 6 Aram Agajanian 2009-06-18 21:17:02 UTC
I just remembered how I got dmraid working under F9.  It involved the initrd image.  I unpacked an initrd image from F8 and looked at the init script inside.  I then copied a couple of lines from init in F8's initrd to the init in F9's initrd.  After that, RAID seemed to work OK.

Comment 7 Aram Agajanian 2009-06-19 18:51:40 UTC
Created attachment 348696 [details]
a new dmesg output with new errors

I just noticed a bunch of errors in dmesg which I would like to call to your attention.  The e1000 driver seems be involved.  I don't know if this is related to the filesystem inconsistency that I experienced earlier.

I don't know exactly what I was doing that caused these errors.  I was doing some downloading earlier.  I will try to test that and report back.

Comment 8 Eric Sandeen 2009-06-24 14:59:16 UTC
Just to be clear, you had been using the F9 system up until the F11 upgrade without any problems?

Comment 9 Aram Agajanian 2009-06-24 15:27:06 UTC
After I updated the initrd in F9, dmraid seemed to be working fine.

I haven't had any further inconsistency errors in F11.  (Two ext3 filesystems are no longer being mounted.)  However, when I run the find command on the /home filesystem (ext3), it still aborts.

Comment 10 Eric Sandeen 2009-06-24 20:07:57 UTC
Let me rephrase the question ... when did the problems with the F9/ext3 partitions start?

Boot-time fsck when booting F11 would not have touched them unless they were flagged as having errors from a previous mount, or if you happened to exceed the maximal mount count.  So I wonder if something went bad under F9; perhaps you can look at the F9 system logs.

Comment 11 Aram Agajanian 2009-06-26 13:50:45 UTC
After the initrd fix mentioned earlier, I never noticed any problems related to filesystems when running F9.  I just looked at and grepped the messages and dmesg log files from F9.  I didn't find anything that would indicate to me that there was a problem related to filesystems when running F9.

Comment 12 Aram Agajanian 2009-06-26 14:17:30 UTC
To answer the earlier question, the first problem that I noticed was the inconsistency error that occured when I rebooted F11 on Tuesday, 6/16.  I believe that the inconsistency error occurred on the second time that I booted F11.
I had installed F11 and booted for the first time four days earlier.

I just looked through the messages files from F11.  I found one relevant error message:

Jun 12 16:16:52 frogn kernel: EXT3-fs error (device dm-9): htree_dirblock_to_tree: bad entry in directory #1295877: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0

Comment 13 Eric Sandeen 2009-06-26 14:59:35 UTC
> Jun 12 16:16:52 frogn kernel: EXT3-fs error (device dm-9):
> htree_dirblock_to_tree: bad entry in directory #1295877: rec_len is smaller
> than minimal - offset=0, inode=0, rec_len=0, name_len=0

Ok, thanks, that explains the fsck on the subsequent boot.  Also, ext3 somewhat easily corrupts directory entries when a drive with write caching loses power, if barriers aren't enabled ... and ext3 disables barriers by default, and dm in the past didn't pass them through even if you enabled them.

Is it possible that the box lost power while this device (dm-9) was mounted?

Comment 14 Aram Agajanian 2009-06-26 16:03:17 UTC
dm-9 was my root partition for F9, so it was always mounted.

As best as I can remember, the computer hasn't lost power recently.  (It is on a UPS.)  I'm sure that it has lost power at some point.

When I first installed F9, the video driver was locking up the computer.  I had to force the computer to powerdown in order to restart it.  I did that many times, but it was at least 9 months ago.

dm-5 is my home partition.  Perhaps the error messages in the first attachment have something to with why the find command doesn't work.  I think I'll try to reformat this partition as ext4 and then restore from backup.

Comment 15 Aram Agajanian 2009-07-03 21:02:53 UTC
I reformatted the partition and restored the home filesystem from backup.  when I did this, I noticed that the disk space numbers in df didn't match those from du -s.  I remembered that was the case even in F9.  So, the problems with these ext3 filesystems existed in F9 ever since I tried to fix dmraid.  However, fsck wasn't triggered for the root partition on F9 but it was on F11.

Thank you for all responses.

Comment 16 Eric Sandeen 2009-07-04 01:13:05 UTC
Ok, thanks for the update.  If you run into any more problems, let us know!

thanks,
-Eric