547128 – 'check' or 'repair' are not working

Bug 547128 - 'check' or 'repair' are not working

Summary: 'check' or 'repair' are not working

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mdadm
Sub Component:
Version:	12
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	566828
TreeView+	depends on / blocked

Reported:	2009-12-13 17:46 UTC by NM
Modified:	2010-04-28 03:09 UTC (History)
CC List:	7 users (show)
Fixed In Version:	initscripts-9.09-1.fc13
Clone Of:
Clones:	566828 (view as bug list)
Environment:
Last Closed:	2010-04-28 03:09:26 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description NM 2009-12-13 17:46:05 UTC

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091105 Fedora/3.5.5-1.fc12 Firefox/3.5.5

After noticing non zero mismatch_cnt on one of my devices I run repaire and then check. It is still non-sero. 

Reproducible: Always

Steps to Reproduce:
As root: 
1. echo repair >/sys/block/md2/md/sync_action
2. echo check  >/sys/block/md2/md/sync_action
3. cat /sys/block/md2/md/mismatch_cnt
Actual Results:  
cat /sys/block/md2/md/mismatch_cnt  shows 256

Expected Results:  
mismatch_cnt must be 0

I am running FC12: 
1) 2.6.31.6-166.fc12.x86_64 
2) mdadm - v3.0.3 - 22nd October 2009

Thanks

Comment 1 Raman Gupta 2009-12-14 16:53:12 UTC

I may be having the same problem... running Fedora 12:

# uname -a
Linux xx 2.6.31.6-162.fc12.x86_64 #1 SMP Fri Dec 4 00:06:26 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

Weekly RAID check sent me an email about the mismatch_cnt. Here is the mdadm version:

# rpm -q --whatprovides /etc/cron.weekly/99-raid-check 
mdadm-3.0.3-2.fc12@x86_64  

Status of my RAID-1 as of the email from 99-raid-check:

# cat /sys/block/md0/md/mismatch_cnt              
31104                   

smartctl reports all drives operating perfectly normally -- and there are no I/O errors in the logs.

After running a check/repair as described by the reporter, I see:

# cat /sys/block/md0/md/mismatch_cnt
2304

Fewer but still non-zero. I've read through all the google material, indicating this is *not* necessarily a problem, however it is still worrisome. Obviously the person writing the 99-raid-check script thought it was worth an email...

Comment 2 NM 2009-12-15 02:42:51 UTC

Either 'repair' option must repair, or 'check' is not working. Am i missing something?

Comment 3 Doug Ledford 2010-02-19 19:38:48 UTC

I am removing the mismatch_cnt check on raid1 devices.  It is possible on a raid1 device for normal operations to result in a non-0 mismatch_cnt and for it to not be an error.  And even if you run a repair, it's possible that ongoing usage (such as swap file writes during the repair process) could result in a non-0 count after the fact.  In general, this check is important on raid4/5/6 devices, but too easily fooled on raid1 devices (and probably raid10 too, but I'm not skipping it yet as I haven't seen any bug reports about raid10 mismatch_cnt problems yet).  This will be fixed in the next mdadm update.

Comment 4 Doug Ledford 2010-02-19 19:40:07 UTC

*** Bug 554217 has been marked as a duplicate of this bug. ***

Comment 5 Ray Todd Stevens 2010-02-19 20:24:57 UTC

Interesting.   My bug is listed a duplicate.  In this bug the repair system is bit working.  However I can actually fix my problem.   The issue is that it won't stay fixed.  :-(   Grrrr.

Comment 6 NM 2010-02-19 20:57:24 UTC

My concern is that if issue in not completely understood than by removing mismatch_cnt check on raid1 devices we just hiding a problem. Sure path to future disaster. It is not my call however (I am not an expert in the field).

Comment 7 Doug Ledford 2010-02-19 21:11:53 UTC

The issue *is* completely understood.  We are not hiding a problem, the problem is well known (and technically not a problem).  For instance, if you have a swap partition, or a swap file, on a raid1 device, the kernel will issue a write to the raid subsystem.  The raid subsystem then issues two writes to disk and merely points each disk at the memory location of the swap data.  If something writes to that data between when disk1 and disk2 complete their respective writes, then the data on disk1 and disk2 will not agree (and will cause a non-0 mismatch_cnt).  Now, at the time that the program changed the swap data between when disk1 and disk2 completed their writes, the overall swap write was still considered "in process" by the upper block layers, and so it knows that the swap write that is "in process" is no longer valid, marks it as such, and if it decides to swap the data out again (which if very well may not, the act of updating the swap data reset the "how recently was this data used" timestamp on the data we were going to swap out, and so it's very likely the kernel will pick something else to swap out next) then it will overwrite the inconsistent areas on disk.  If the swapper decides not to swap the data out again, then we end up with more or less abandoned areas on the swap file that don't agree between the two disks, but we don't care because we won't ever read them back in without first getting a good, consistent write to them.  Likewise certain mmaped files are treated the same way.  Under some conditions we will have inconsistent data between the two disks, but we will always correct it unless we end up throwing it away in which case correcting it is a waste of time.

The real problem is that we don't have a good way of distinguishing between these sectors of thrown away data that don't match and don't need to match, versus sectors of data that the filesystem thinks *should* be consistent but aren't.

Comment 8 NM 2010-02-19 21:52:40 UTC

Thanks for explanation. I have 2 more questions:
 
1) Why other types of raid(4/5/6) are not affected by this problem?
2) I was under impression that if a program uses a swap space - kernel protects it from the others to use. How then the scenario described above is possible at all?

Apologies if this is a naive questions.

Comment 9 Doug Ledford 2010-02-19 21:58:43 UTC

1) All raid types that use parity (4/5/6) are sensitive to whether or not the data changes between the time that the parity calculation is made and when the data is written to disk.  As a result, we use a buffered write in these raid levels.  We actually copy the current data to a private area, then do the xor on the private data, then write the private data.  If the application does further writes to the data *after* it has submitted the write to the raid layer, we simply have to do another write.  We don't allow the updated data to interfere with out in process write.  But, since this is costly in terms of performance, we don't do this on raid1 arrays.

2) An application never uses swap space itself, only the kernel uses swap space.  The kernel picks what memory it wants to swap out and starts the swap process.  The application merely writes to its own memory while the swap is in process, not to the swap space itself.  But, because the write happens while the swap out is in process, the disks can end up with different data.  Once the kernel realizes that the application has written into its own memory, the kernel no longer thinks that the memory is a good candidate for swapping out, so it cancels the swap request leaving the already written blocks in an inconsistent state.

Comment 10 Ray Todd Stevens 2010-02-19 22:09:45 UTC

OK interesting.   In my case the swap partition is not on raid of any kind.   I have this problem occuring regularly on 9 machines.   These machines have various configurations.    But none of them have swap on raid.   The all have swap directly on the drives.   

About half have the swap on one of the second of two drives that also contain the partition that is then running RAID1.   I have been moving to a configuration where I have two swap partitions, one on each drive so that I can recover from failure faster.   Both types of configurations have this problem.

The mismatch is on a data partition.   And it keeps coming back.   So do I have a separate problem?   Incidentally this problem is new to fedora 12.

Comment 11 Doug Ledford 2010-02-19 22:28:49 UTC

Todd, swap is just one of the ways this can happen with raid1 devices.  It can also happen with mmap'ed files and very likely in other situations I'm not qualified to speak to (as they involve knowing the inner semantics of filesystems and that's beyond my scope of expertise).  The real point is that this is simply a valid situation on raid1 devices, period.

As for being new to fedora 12, it's not that the situation is new, it's that the raid-check script is new.  We just never checked it before.

Comment 12 Fedora Update System 2010-02-20 00:02:39 UTC

mdadm-3.1.1-0.gcd9a8b5.3.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/mdadm-3.1.1-0.gcd9a8b5.3.fc12

Comment 13 Fedora Update System 2010-02-20 00:02:52 UTC

mdadm-3.1.1-0.gcd9a8b5.3.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/mdadm-3.1.1-0.gcd9a8b5.3.fc13

Comment 14 Fedora Update System 2010-02-20 03:49:09 UTC

mdadm-3.1.1-0.gcd9a8b5.3.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F13/FEDORA-2010-1714

Comment 15 Fedora Update System 2010-02-20 07:35:21 UTC

mdadm-3.1.1-0.gcd9a8b5.3.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-1891

Comment 16 Fedora Update System 2010-04-09 20:13:06 UTC

mdadm-3.1.2-9.fc13,initscripts-9.09-1.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/mdadm-3.1.2-9.fc13,initscripts-9.09-1.fc13

Comment 17 Fedora Update System 2010-04-13 01:40:38 UTC

mdadm-3.1.2-9.fc13, initscripts-9.09-1.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm initscripts'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.2-9.fc13,initscripts-9.09-1.fc13

Comment 18 Fedora Update System 2010-04-28 03:08:37 UTC

initscripts-9.09-1.fc13, mdadm-3.1.2-10.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.