Red Hat Bugzilla – Bug 116685
reboot while raid 1 primary waits to be reconstructed brings array up fully synced
Last modified: 2007-11-30 17:10:37 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040217
Description of problem:
I had two raid 1 devices across two different hard disks. One of the
disks had a temporary failure, and the devices got degraded.
After a reboot, I raidhotadded the two failed partitions, one to each
raid device, and they started syncing.
Figuring this would be a good opportunity to test raid 1 features in
the new kernel, I thought I'd raidsetfaulty the newly-added
partitions. The one that was syncing was marked as faulty, but
syncing continued. The other was marked as faulty, but it couldn't be
raidhotremoved any more. After a reboot into FC1, the one that was
syncing when marked as faulty is brought back on-line and finishes
syncing. The other never completes syncing, and produces a lot of
noise in /var/log/messages (I'll attach one of them; the raid devices
are <hdc5><hdm5> and <hdc6><hdm6>). hdm6 is the one that won't be
activated and, if marked as faulty, can't be removed (both mdadm and
raidhotremove report the syscall to do so failed). The only way I
could find to get the array back to a functional state was to mark the
partitions with some type other than raid auto-detect, such that the
arrays would be brought up as failed, reboot into the latest FC1
kernel and then change the partition types back and raidhotadd the
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.raidhotadd partitions to two degraded raid1 devices that can't be
synced at the same time
2.mark the one that hasn't started syncing yet as faulty
3.try to raidhotremove it
Actual Results: Operation failed
Expected Results: It should be removed, not brought back as a spare
after reboot, that never gets synced to.
Created attachment 97985 [details]
extract from FC1 /var/log/messages with the unsyncable/unremovable spare (hdm6)
Here's a 100% reliable procedure to duplicate the problem:
- create two separate, degraded raid 1 devices, using partitions from
the same disk
- raidhotadd one partition to each raid 1 device. one of them will
start syncing, and the other will wait for it
- before the first raid 1 device finishes syncing, raidsetfaulty the
partition raidhotadded to the second raid 1 device.
- attempt to raidhotremove the partition marked as faulty: it will fail
Another, probably related problem:
- create two raid devices, just like in the previous case.
- raidhotadd the two devices, just like in the previous case.
- reboot before the first one finishes syncing
- the second raid device will be marked as fully synced, even though
it was never synced. Potential for data loss, or even failure to
boot, is pretty high, depending on how critical the filesystem is.
It seems to me that the problem has to do with the resync thread
already marking the devices as active, such that when they come back
up after a reboot, they appear to be good.
FWIW, the `probably related problem' above (that an unfinished sync to
spare will not restart syncing after reboot) is fixed in 2.6.4-1.305.
I haven't checked the first problem yet.
*** Bug 129608 has been marked as a duplicate of this bug. ***
Looks like this is all fixed in rawhide.
I was mistaken. The problem of rebooting while one array syncs and
another waits for syncing, such that the array comes back up as if the
sync had completed, is still present.
still a problem with the 2.6.9 based update ?
2.6.9-1.715_FC3 gets the reboot-with-array-waiting-to-resync case correctly.
Ugh. The problem in which a reboot while there's an array waiting to
reconstruct its primary copy causes the array to come back up fully
synced, even though nothing was actually synced, is back in
2.6.10-1.1155_FC4 (or maybe it was never fixed, and I goofed when
testing :-( I got it on x86_64 this time. Odds are that it affects
RHEL4 as well :-(
I have seen similar symptoms in FC3 with kernel-2.6.11-1.35_FC3. This bug should
have higher priority as it may cause major data loss.
I have a system with a failing disk, which I wanted to replace with a raid on
two new disks. I did the following.
First I partitioned one of the new disks and created degraded raid-1 arrays. I
copied the data from the old disk to the raid and shut down. While working with
the degraded raid, I noticed that when the processes are signaled during
shutdown, some "immediate safe mode" messages are produced by md.
Then I replaced the old disk with another new disk which I partitioned and added
to the raids. Before recovery had completed, I shut down the system. At shutdown
a lot of error messages were produced, but they didn't get logged.
After this, the system couldn't boot. It would fail to mount the root fs and
panic. I put the old disk in a tray to boot from it. I found that all the raid
devices for which the recovery had been delayed had been marked as fully synced,
and reading from the noninitialized disk had caused mounting the root device to
I was able to recover the raid by marking the noninitialized partitions as
faulty. But if the problem is not noticed and solved immediately, it may cause
serious data loss.
I noticed that this does not only affect new disks. An unclean raid-1 with
multiple disks is not synced on boot like it was on FC1.
Kasper, I think what you're observing is just the result of an optimization in
the RAID subsystem, that's present in newer kernels: when all members of a RAID
1 device are stable and up-to-date, they're marked as clean, such that, should
an unexpected reboot ensue, they don't have to be resynced, since they're known
to be in sync.
It is possible that the missing sync on unexpected reboot is an uptimization. I
haven't yet had the option to investigate that further. How soon after the last
write is it supposed to mark the raid clean?
The missing sync of a newly added disk however is clearly a bug.
Alex does this still affect current rawhide ?
I tried to trigger it again, and this time it worked, although I had to follow a
slightly different procedure (run mdadm -A by hand to bring it up, since
initrd.img will no longer bring up all raids it can find). I guess this means