477856 – RAID got out of sync resulting in severe file system corruption

Bug 477856 - RAID got out of sync resulting in severe file system corruption

Summary: RAID got out of sync resulting in severe file system corruption

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	10
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-12-24 11:03 UTC by Kasper Dupont
Modified:	2009-12-18 07:23 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-12-18 07:23:18 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg output (32.81 KB, application/octet-stream) 2008-12-26 21:43 UTC, Kasper Dupont	no flags	Details
View All

Description Kasper Dupont 2008-12-24 11:03:31 UTC

Description of problem:
I run F10 with a RAID 1 setup across two USB disks. At one point I had forgotten to turn on both USB disks before booting. The system booted with a degraded array - so far it appeared to be working as intended.

When connecting the other disk and trying to resync, the system crashed during resync. After this had happened the system would no longer boot because it said it could not find the md device.

I booted the installer in rescue mode and found that even there I could not activate the RAID. Trying to do a read only fsck of the underlying partitions I found that one of them would appear to have zero length when reading the partition. The partition was listed with correct length in /proc/partitions, and reading the same area of the disk by reading from the appropriate offset on the raw disk.

I shut the system down moved each disk to a working Fedora 8 system where I did a fsck of each of them to find out which mirror was in the most consistent state. One of them came out clean after journal replay. I changed partition type on the worse of them so it would not get auto detected. I also verified that I could activate the RAID device on the Fedora 8 system, and made a backup of the files from it.

After putting the disks back on the original system I was able to boot. I changed partition type of the bad mirror back to auto detect and tried to put the partition back in the RAID. The resync started and again the system crashed during resync.

I decided to do the resync on the Fedora 8 system (which was the also the system where I had done the initial sync of the RAID). But when doing a read only fsck of each of the disks I found that both of them had lots of blocks claimed by different inodes. It was not the same files that were corrupted on each mirror.

Most of the corrupted files were files that had been written to the system during install and were already on both mirrors before the system was first booted with a single mirror, and they had not been touched during the process, so no matter which of the mirrors were accessed those files would have been pinned down to the same locations, so I have no clue how the sectors could have been reused for something else.

I am wondering if it may have contributed to the system getting into this shape, that it does not have a battery backed clock. I inserted a small script before running init, that would sync the hardware clock from ntp and reboot if it had not been synced before. This ran with the root file system read only, but perhaps timestamps in the raid headers got messed up. I can understand that the lack of battery backed hardware clock can lead to wrong timestamps on files and logfiles with useless times. But it shouldn't lead to file system corruption.

Version-Release number of selected component (if applicable):
kernel-2.6.27.5-117.fc10.i586

How reproducible:
Don't want to try.

Steps to Reproduce:
See above.

Actual results:
Root file system corrupted beyond repair.

Expected results:
No file system corruption.

Additional info:
See above.

Comment 1 Kasper Dupont 2008-12-24 11:09:58 UTC

Forgot to mention, the machine consists of this board http://pcengines.ch/alix2c3.htm with a 4GB CF card and 2x1TB USB storage.

Contents of /proc/mdstat when the system was working:
Personalities : [raid1] [raid6] [raid5] [raid4] 
md6 : active raid1 sda6[0] sdb6[1]
      2096384 blocks [2/2] [UU]

md1 : active raid1 sdb1[0] sdc1[2] sda1[1]
      3879552 blocks [3/3] [UUU]

md7 : active raid1 sda7[0] sdb7[1]
      970542720 blocks [2/2] [UU]

unused devices: <none>

(md7 is the root device which got corrupted)

Comment 2 Kasper Dupont 2008-12-25 00:49:39 UTC

After restoring the system from backup it will no longer let me log in as root. When trying to ssh in I get the error message:
Unable to get valid context for root
When trying to log in on the console I get the error message:
login: no shell: Permission denied.

Comment 3 Kasper Dupont 2008-12-25 22:01:25 UTC

The complete sequence of steps I had to perform to make the system boot again were as follows:
1. "fsck.ext3 -y -C 0 /dev/md7" on a running system to bring the file system in a consistent state.
2. Moved the remains of the corrupted installation into a subdirectory.
3. Restored from backup.
4. Created .autorelabel in the root.
5. Used fdisk and mdadm on a running system to put the other partition back in the raid.

Comment 4 Kasper Dupont 2008-12-25 22:40:48 UTC

I'd like to try adding code to the initrd to sync the clock using ntp before loading usb and raid drivers. The --net-dev option looks promising, but I still need to get some ntp code into the initrd as well. Any suggestions on how to proceed? Is it feasible to use the ntpdate and hwclock executables from /sbin within an initrd image, or am I better off trying to add this code to the nash executable itself?

Comment 5 Kasper Dupont 2008-12-26 00:10:34 UTC

Proof of concept showing running ntpdate from initrd is possible. I am still wondering if I might need to reboot after setting the clock to get rid of all traces of the initial value of the system clock that may be left around in the kernel.

I'll need to clean this up a bit more. I am going to run ntpdate from initrd for a while to see if it keeps working well. I still don't know if the wrong clock was a contributing factor to the initial breakage.

--- /sbin/mkinitrd      2008-11-12 20:47:40.000000000 +0100
+++ bin/mkinitrd        2008-12-26 00:59:11.000000000 +0100
@@ -1308,6 +1308,9 @@
 inst /sbin/nash "$MNTIMAGE" /bin/nash
 inst /sbin/modprobe "$MNTIMAGE" /bin/modprobe
 inst /sbin/rmmod "$MNTIMAGE" /bin/rmmod
+inst /usr/sbin/ntpdate "$MNTIMAGE" /bin/ntpdate
+inst /lib/libnss_files.so.2 "$MNTIMAGE" /lib/libnss_files.so.2
+inst /etc/services "$MNTIMAGE" /etc/services
 
 if [ -e /etc/fstab.sys ]; then
     inst /etc/fstab.sys "$MNTIMAGE"
@@ -1601,7 +1604,7 @@
 #!/bin/nash
 
 mount -t proc /proc /proc
-setquiet
+#setquiet
 echo Mounting proc filesystem
 echo Mounting sysfs filesystem
 mount -t sysfs /sys /sys
@@ -1635,7 +1638,7 @@
 done
 [ -n "$I18N" ] && emit "/lib/udev/console_init tty0"
 
-emit "daemonize --ignore-missing /bin/plymouthd"
+emit "#daemonize --ignore-missing /bin/plymouthd"
 
 # If we have drm loaded, include modesetting drivers
 if [ "x$PROBE" == "xyes" -a -d /sys/class/drm ]; then
@@ -1695,6 +1698,22 @@
     done
 fi
 
+emit "ntpdate $(
+    for N in {0..3}.fedora.pool.ntp.org
+    do
+        host $N
+    done |
+        awk '/has address (.*)/ {print $4}' |
+        sort -u |
+        while read N
+        do
+            echo "$RANDOM $N"
+        done |
+        sort |
+        awk '{print $2}' |
+        paste -s - -d' '
+)"
+
 emit_iscsi
 
 if [ "$scsi_wait_scan" == "yes" ]; then

Comment 6 Kasper Dupont 2008-12-26 21:43:31 UTC

Created attachment 327868 [details]
dmesg output

The problem showed up again. First the system reported I/O errors and locked up. After that the system was able to boot but kicked one disk out of the raid because it was considered outdated. While the system was using only one disk I tried running a sha1sum of all files on the system, but it reported I/O errors again before it finished, and locked up again. I tried booting again. At that point the root file system had become corrupted - the kernel was reporting stale NFS file handles on the root file system and failed to boot.

I tried booting with each individual disk. With one of them I was able to boot. The one it did manage to boot with was not up to date. Again I tried running a sha1sum of all files on the disk. And again it reported I/O errors. I managed to grab dmesg output this time, and I have attached it.

It appears that part of the problem is with some USB instability. Notice that the same pair of disks are stable on another machine running Fedora 8. Is there some problem in the USB drivers?

Comment 7 Kasper Dupont 2009-03-12 21:12:47 UTC

The corruption also happens on kernel version 2.6.27.19-170.2.35.fc10.i586.

I noticed that if the machine lose connectivity to the USB disks during a raid recovery, the raid resync keeps running and accessing both disks. And at some point this causes corruption of the file system on the disk it was copying from.

I did at one point manage to get sufficient logs from the kernel to see that the kernel claims the USB device was disconnected, then the raid reports failure, and then the kernel reports a USB disk is connected and assigns it a different device name.

I also noticed that even while it is working, the kernel frequently reports that the USB device is being reset:
 usb 1-1: reset high speed USB device using ehci_hcd and address 3
 usb 1-2: reset high speed USB device using ehci_hcd and address 2
These appears to be harmless. I'll try to increase the reset timeout to see if that improves USB stability, but even when the USB connection fails, the kernel shouldn't start copying in the wrong direction.

Comment 8 Bug Zapper 2009-11-18 09:42:43 UTC

This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Bug Zapper 2009-12-18 07:23:18 UTC

Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.