Bug 477739

Summary: EXt3 Raid 1 File system corruption with LVM2 Snapshot volume overflow. (Possibly)
Product: Red Hat Enterprise Linux 5 Reporter: Richard Chapman <rchapman>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: low    
Version: 5.2CC: agk, dwysocha, heinzm, iannis, jbrassow, mbroz, prockai, rchapman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-05 14:18:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard Chapman 2008-12-23 07:29:27 UTC
Description of problem:
I had some Kernel Errors and 2 systemn crashes on a sstem whcih ran fine for more than a year. After the second crsh - the filesystem was so corrupt - the system would not boot. Both the Journal and the volume group descriptor were reported to be corrupt. I have rebuilt the system from backups -and it is working fine again.
I have an ext3 root file-system running on a software raid 1 array with LVM2.
At the time of the incident I was running 2.6.18-92.1.18.el5.
I had created a snapshot volume in VolGroup00. I believe the snapshot violume was physically located between LogVol00 (swap) and LogVol01 (root) because I shrank the logvol00 to make space for the snapshot.
I forgot about the snapshot and left in in place for several days until I got the following kernel errors in my logwatch.
 --------------------- Kernel Begin ------------------------ 

 
 WARNING: Kernel Errors Present
    Buffer I/O error on device dm-2, ...: 20 Time(s)
 
 ---------------------- Kernel End ------------------------- 
--------------------- Kernel Begin ------------------------ 

 WARNING: Kernel Errors Present
    Buffer I/O error on device dm-2, ...: 5 Time(s)
    EXT3-fs error (device dm-2): e ...: 750 Time(s)
    EXT3-fs error (dffset 0 ...: 1 Time(s)
 
 ---------------------- Kernel End ------------------------- 
At that time - I deleted the snapshot volume - and everything seemed fine. A few days later the system crashed - but rebooted oK. A few days later it crashed again - but this time the root filesayatem was totally corrupt.

There is only circumstantial evidence - but my theory is that the file-system got some minor corruption at the time of the snapshot overflow - and the corruption got exacerbated by subsequest events.

If there is any more information that wuld be useful - pleaae contact me and I'll see if I can provide it. I can be contacted at chapman dot richard at gmail dot com.

This may be related to Bug 461289 BUT in this case it was definitely the root file system which was corrupted - not just the snapshot. It isn't clear to me which was corrupted in Bug 461289.


Version-Release number of selected component (if applicable):


How reproducible:

I haven't attempted to reproduce the problem.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Heinz Mauelshagen 2010-10-05 14:18:27 UTC
Closing because of long term dormancy. Reopen if problem still exists in current release.