231230 – leg failure on cmirrors causes devices to be stuck in SUSPEND state

Bug 231230 - leg failure on cmirrors causes devices to be stuck in SUSPEND state

Summary: leg failure on cmirrors causes devices to be stuck in SUSPEND state

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	cmirror
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-03-06 21:27 UTC by Corey Marthaler
Modified:	2010-01-12 02:02 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-08-05 21:42:36 UTC
Embargoed:

Attachments	(Terms of Use)
log from link-02 (108.04 KB, text/plain) 2007-03-06 21:33 UTC, Corey Marthaler	no flags	Details
log from link-04 (93.55 KB, text/plain) 2007-03-06 21:34 UTC, Corey Marthaler	no flags	Details
log from link-07 (109.29 KB, text/plain) 2007-03-06 21:36 UTC, Corey Marthaler	no flags	Details
log from link-08 (123.74 KB, text/plain) 2007-03-06 21:37 UTC, Corey Marthaler	no flags	Details
View All

Description Corey Marthaler 2007-03-06 21:27:24 UTC

Description of problem:
I had three multi legged mirrors on a 4 node x86_64 cluster. Each of the three
had a GFS filesystem on top of it and each was running I/O (xiogen/xdoio). I
then failed /dev/sdh1 which was a legg in all three mirrors. That ended up
causing all clvmd operations to hang.

[root@link-08 ~]# lvs -a -o +devices
  LV                 VG    Attr   LSize Origin Snap%  Move Log          Copy% 
Devices                                  
  mirror1            corey mwi-ao 5.00G                    mirror1_mlog 100.00
mirror1_mimage_0(0),mirror1_mimage_1(0)  
  [mirror1_mimage_0] corey iwi-ao 5.00G                                       
/dev/sdh1(0)                             
  [mirror1_mimage_1] corey iwi-ao 5.00G                                       
/dev/sda1(0)                             
  [mirror1_mlog]     corey lwi-ao 4.00M                                       
/dev/sdg1(0)                             
  mirror2            corey mwi-ao 5.00G                    mirror2_mlog 100.00
mirror2_mimage_0(0),mirror2_mimage_1(0),mirror2_mimage_2(0)
  [mirror2_mimage_0] corey iwi-ao 5.00G                                       
/dev/sdh1(1280)                          
  [mirror2_mimage_1] corey iwi-ao 5.00G                                       
/dev/sdb1(0)                             
  [mirror2_mimage_2] corey iwi-ao 5.00G                                       
/dev/sdc1(0)                             
  [mirror2_mlog]     corey lwi-ao 4.00M                                       
/dev/sda1(1280)                          
  mirror3            corey mwi-ao 5.00G                    mirror3_mlog 100.00
mirror3_mimage_0(0),mirror3_mimage_1(0),mirror3_mimage_2(0),mirror3_mimage_3(0)
  [mirror3_mimage_0] corey iwi-ao 5.00G                                       
/dev/sdh1(2560)                          
  [mirror3_mimage_1] corey iwi-ao 5.00G                                       
/dev/sdd1(0)                             
  [mirror3_mimage_2] corey iwi-ao 5.00G                                       
/dev/sde1(0)                             
  [mirror3_mimage_3] corey iwi-ao 5.00G                                       
/dev/sdf1(0)                             
  [mirror3_mlog]     corey lwi-ao 4.00M                                       
/dev/sda1(1281)                          


After the failure:

[root@link-07 ~]# dmsetup table
corey-mirror1_mlog: 0 8192 linear 8:97 384
corey-mirror3_mimage_3: 0 10485760 linear 8:81 384
corey-mirror3_mimage_2: 0 10485760 linear 8:65 384
corey-mirror2_mlog: 0 8192 linear 8:1 10486144
corey-mirror3: 0 10485760 mirror clustered_disk 4 253:11 1024
LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNHaArgVnIEdototLjaZ1n3gGBfGLUE5UX
block_on_error 4 253:12 0 253:13 0 253:14 0 253:15 0
corey-mirror3_mimage_1: 0 10485760 linear 8:49 384
corey-mirror2: 0 10485760 mirror clustered_disk 4 253:6 1024
LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNsm3bgSmAfNwdBCvVj5czrJoP1pfeWWtA
block_on_error 3 253:7 0 253:8 0 253:9 0
corey-mirror2_mimage_2: 0 10485760 linear 8:33 384
corey-mirror3_mimage_0: 0 10485760 error
corey-mirror3_mlog: 0 8192 linear 8:1 10494336
corey-mirror2_mimage_1: 0 10485760 linear 8:17 384
corey-mirror1: 0 10485760 mirror clustered_disk 4 253:2 1024
LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN5YTb2ydV3fDD0WE0n8EAexTiiSO16TmN
block_on_error 2 253:3 0 253:4 0
corey-mirror2_mimage_0: 0 10485760 error
corey-mirror1_mimage_1: 0 10485760 linear 8:1 384
corey-mirror1_mimage_0: 0 10485760 error
VolGroup00-LogVol01: 0 4063232 linear 3:2 151781760
VolGroup00-LogVol00: 0 151781376 linear 3:2 384


[root@link-07 ~]# dmsetup info
Name:              corey-mirror1_mlog
State:             SUSPENDED
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN5YTb2ydV3fDD0WE0n8EAexTiiSO16TmN

Name:              corey-mirror3_mimage_3
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 15
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNRN3uVucbxgF6dUUyvce4F6Z56Q5xWbjU

Name:              corey-mirror3_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 14
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNXYFWtroAvH9vpDleSAD7xZ25yeWEm1MU

Name:              corey-mirror2_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 6
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNsm3bgSmAfNwdBCvVj5czrJoP1pfeWWtA

Name:              corey-mirror3
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      5
Major, minor:      253, 16
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNXQk2lh5R7BJyMeqiBZNLtN2b19Ln5jtl

Name:              corey-mirror3_mimage_1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 13
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNZ8j7Y5lKdS57yFTWnnoxM3tzuWCam2DL

Name:              corey-mirror2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      2
Major, minor:      253, 10
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN7VblEpnv25VsheGU7csOAdAZ9r5GY0W1

Name:              corey-mirror2_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 9
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN84BMKoGOMZu2GMGL7rR5EBaVBYkVvbBd

Name:              corey-mirror3_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 12
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNpvomnKVRe1Ktx5r1AanmniQWmcUy3Nh4

Name:              corey-mirror3_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 11
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNHaArgVnIEdototLjaZ1n3gGBfGLUE5UX

Name:              corey-mirror2_mimage_1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 8
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNZbv7CvUlWG20vzlPTYhik4u43DhI7M98

Name:              corey-mirror1
State:             SUSPENDED
Tables present:    LIVE & INACTIVE
Open count:        1
Event number:      2
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrN2RIp5WgoVj46mrlYYU1fdrBKP4BjJuus

Name:              corey-mirror2_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 7
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNQGetVoz6YbvBiX7cVlh8Oul5vPLZKVvv

Name:              corey-mirror1_mimage_1
State:             SUSPENDED
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 4
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNUgs5K6Msk6mfVRtM8vh3BD5x2JEmUDSm

Name:              corey-mirror1_mimage_0
State:             SUSPENDED
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-KEY63wD4MPsiaIcOe600esYDGASesbrNQ0xDdJtYU863Myf9i4VuJAQZXhjFKpyX

Name:              VolGroup00-LogVol01
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xQjIYEP6InPgUU5nsDPYSZl5EAEKqRWcY

Name:              VolGroup00-LogVol00
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xrapcuzOGNgADTzIRUNTk0MZBbtWAy


Version-Release number of selected component (if applicable):
2.6.9-49.ELlargesmp
cmirror-kernel-largesmp-2.6.9-24.0


I'll attach all the stack traces for the hung/running processes

Comment 1 Corey Marthaler 2007-03-06 21:33:33 UTC

Created attachment 149388 [details]
log from link-02

Comment 2 Corey Marthaler 2007-03-06 21:34:27 UTC

Created attachment 149389 [details]
log from link-04

Comment 3 Corey Marthaler 2007-03-06 21:36:03 UTC

Created attachment 149390 [details]
log from link-07

Comment 4 Corey Marthaler 2007-03-06 21:37:35 UTC

Created attachment 149391 [details]
log from link-08

Comment 5 Corey Marthaler 2007-03-06 22:34:16 UTC

This bug is reproducable. 

Name:              corey-mirror4
State:             SUSPENDED
Tables present:    LIVE & INACTIVE
Open count:        1
Event number:      1
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-r25APQaO2jVckDoLKsoAARInYt1p0mqop2zn17MV0vt2xIcSGS1tjxAfGr0KB9X5

Comment 6 Corey Marthaler 2007-03-06 23:15:47 UTC

After trying this a few times with complicated 3 and 4 legged cmirror
configurations, I attempted the simpliest case. One cmirror, one GFS filesystem,
one failure, and it appears to hang just like the more difficult cases above.
This failure scenario used to work greater than 90% of the time, so it appears
that a regression has been introduced into cmirror-kernel-largesmp-2.6.9-24.0.
That, or this is a largesmp issue. I'll attempt this now with smp.

[root@link-08 ~]# lvs -a -o +devices
  LV                 VG    Attr   LSize Origin Snap%  Move Log          Copy% 
Devices                   
  cmirror            feist mwi-ao 8.00G                    cmirror_mlog 100.00
cmirror_mimage_0(0),cmirror_mimage_1(0)
  [cmirror_mimage_0] feist iwi-ao 8.00G                                       
/dev/sdh1(0)              
  [cmirror_mimage_1] feist iwi-ao 8.00G                                       
/dev/sda1(0)              
  [cmirror_mlog]     feist lwi-ao 4.00M                                       
/dev/sdg1(0)              


[ killed /dev/sdh ]

[root@link-08 ~]# lvs -a -o +devices
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  LV                 VG    Attr   LSize Origin Snap%  Move Log          Copy% 
Devices                   
  cmirror            feist mwi-so 8.00G                    cmirror_mlog  99.95
cmirror_mimage_0(0),cmirror_mimage_1(0)
  [cmirror_mimage_0] feist iwi-so 8.00G                                        
                         
  [cmirror_mimage_1] feist iwi-so 8.00G                                       
/dev/sda1(0)              
  [cmirror_mlog]     feist lwi-so 4.00M                                       
/dev/sdg1(0)              
[root@link-08 ~]# lvs -a -o +devices
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
[HANG]


[root@link-08 ~]# dmsetup info
Name:              feist-cmirror
State:             SUSPENDED
Tables present:    LIVE & INACTIVE
Open count:        1
Event number:      1
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-Bdy1hJ3CV5h3FjQYzF6oUj1Diy8WlHuFrP78jR8t27LUNs2TuoUI4bNvnf9VIJcG

Name:              feist-cmirror_mlog
State:             SUSPENDED
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-Bdy1hJ3CV5h3FjQYzF6oUj1Diy8WlHuFHuA3j3gV3DsqnwONzJrvi6wAHTW0402u

Name:              feist-cmirror_mimage_1
State:             SUSPENDED
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 4
Number of targets: 1
UUID: LVM-Bdy1hJ3CV5h3FjQYzF6oUj1Diy8WlHuFgFCtJ639S8sFOi12g2F6JvO1ZBZxLW8w

Name:              feist-cmirror_mimage_0
State:             SUSPENDED
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-Bdy1hJ3CV5h3FjQYzF6oUj1Diy8WlHuFcIpRkYVU2XcQNyqdZkrjWlaXhdCh6q1m

Name:              VolGroup00-LogVol01
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: LVM-kXi6raoZjwmxhIMi9yE47lLMxD7rCSgkT1h6aidVTpJgaACOvfJduKZ88jI03FEf

Name:              VolGroup00-LogVol00
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: LVM-kXi6raoZjwmxhIMi9yE47lLMxD7rCSgkXAqtIL9TdeqFs3GAoSPDj3JVguOzhxCd

Comment 7 Corey Marthaler 2007-03-07 00:07:47 UTC

reproduced on smp cluster as well.

Comment 8 Corey Marthaler 2007-03-07 16:46:03 UTC

Reproduced this issue without I/O or GFS on top of the cmirror. It no longer
appears that a proper down convert works after a cmirror leg failure.

This issue will need to block the release of cluster mirrors.

Comment 9 Jonathan Earl Brassow 2007-03-08 04:05:15 UTC

I can easily reproduce this.

It appears that the suspended devices are not the problem, as those appear on
the machines that are making progress.  It's the machine that has a LIVE &
INACTIVE table for the mirror (but no suspended devices) that is causing the
problem.  This machine also seems to be the log server (every time).

Comment 10 Jonathan Earl Brassow 2007-03-08 08:47:32 UTC

reverting one of the previous changes put in for another bug seems to fix the
problem.  Not sure why yet...

Comment 11 Jonathan Earl Brassow 2007-03-14 04:28:24 UTC

Notes from check-in:

The problem here appears to be timeouts related to clvmd.
During failures under heavy load, clvmd commands (suspend/resume/
activate/deactivate) can take a long time.  Clvmd assumes to quickly
that they have failed.  This results in the fault handling being left
half done.  Further calls to vgreduce (by hand or by dmeventd) will
not help because the _on-disk_ version of the meta-data is consistent -
that is, the faulty device has been removed.

The most significant change in this patch is the removal of the
'is_remote_recovering' function.  This function was designed to check
if a remote node was recovering a region so that writes to the region
could be delayed.  However, even with this function, it was possible
for a remote node to begin recovery on a region _after_ the function
was called, but before the write (mark request) took place.  Because
of this, checking is done during the mark request stage - rendering
the call to 'is_remote_recovering' meaningless.  Given the useless
nature of this function, it has been pulled.  The benefits of its
removal are increased performance and much faster (more than an
order of magnitude) response during the mirror suspend process.

The faster suspend process leads to less clvmd timeouts and
reduced probability that bug 231230 will be triggered.

However, when a mirror device is reconfigured, the mirror sub-devices
are removed.  This is done by activating them cluster-wide before
their removal.  With high enough load during recovery, these operations
can still take a long time - even though they are linear devices.
This too has the potential for causing clvmd to timeout and trigger
bug 231230.  There is no cluster logging fix for this issue.  The
delay on the linear devices must be determined.  A temporary
work-around would be to increase the timeout of clvmd (e.g. clvmd -t #).

Comment 12 Jonathan Earl Brassow 2007-03-14 04:37:24 UTC

assinged -> post

Comment 13 Jonathan Earl Brassow 2007-03-15 01:40:30 UTC

post -> modified

Comment 14 Corey Marthaler 2007-03-15 08:33:43 UTC

Initial tests seems to show that simple mirror failure cases once again work.
However before marking this verified, when a cmirror is not fully sync'ed and a
failure takes place, clvmd appears to hang.  Is this a new bug, or an unresolved
portion of this bug?

Comment 15 Jonathan Earl Brassow 2007-03-15 14:46:42 UTC

<insert the usual questions>

Also, if you can capture the clvmd debug output, that might help us figure out
what clvmd is stuck on.  To do this, you can run 'clvmd -d >& clvmd_output.txt &'

Comment 16 Corey Marthaler 2007-03-15 18:30:12 UTC

I added my lastest non-synced cmirror leg failure attempt to bz 217895 as it
resulted in lost election results from cmirror server and caused a node to be
fenced.

Comment 17 Corey Marthaler 2007-03-15 20:35:53 UTC

Lately I can no longer recreate the hang mentioned in comment #14, however, the
mirror -> linear down conversion is still not taking place. Instead it appears
to be corrupted.

SCENARIO - [fail_non_syncd_primary_leg]
Creating mirror on link-08 using device /dev/sdh1 (that we will fail) for
primary leg
lvcreate -m 1 -n fail_non_syncd_primary -L 500M helter_skelter /dev/sdh1:0-500
/dev/sdf1:0-500 /dev/sdb1:0-50
mirror is only 65.60% synced right now

Disabling device sdh on link-02
Disabling device sdh on link-04
Disabling device sdh on link-07
Disabling device sdh on link-08
Attempting I/O to cause mirror conversion
dd: writing to `/dev/helter_skelter/fail_non_syncd_primary': Input/output error
1+0 records in
0+0 records out
Verifying the down conversion from mirror to linear
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-5: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 524222464: Input/output error
[...]
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'Xhx3VZ-NGbq-1Mzy-n2rk-3xS1-aG2k-LJYjqz'.
  Couldn't find all physical volumes for volume group helter_skelter.
  Volume group "helter_skelter" not found
down conversion didn't appear to work as /dev/sdf1 should now be apart of a linear


I see these I/O errors on all the consoles:

device-mapper: A read failure occurred on a mirror device.
device-mapper: Unable to retry read.
scsi2 (1:1): rejecting I/O to offline device
scsi2 (1:1): rejecting I/O to offline device
device-mapper: A read failure occurred on a mirror device.
device-mapper: Unable to retry read.
scsi2 (1:1): rejecting I/O to offline device
scsi2 (1:1): rejecting I/O to offline device
device-mapper: A read failure occurred on a mirror device.
device-mapper: Unable to retry read.
scsi2 (1:1): rejecting I/O to offline device

Comment 18 Corey Marthaler 2007-03-15 20:37:21 UTC

Jon - are you seeing any issues w/ non sync'ed cmirror failures, or am I still
the only one?

Comment 19 Jonathan Earl Brassow 2007-03-16 14:13:38 UTC

I am not seeing this at all.  I wonder if it has something to do with new LVM
packages.  [When you update to new versions of the software, please indicate that.]

I will try updating my LVM/device-mapper packages

Comment 20 Jonathan Earl Brassow 2007-03-16 14:52:06 UTC

I've updated and I still don't see it (after about ten tries).  I'll keep
trying, but I hope you can help me reproduce.

Comment 21 Corey Marthaler 2007-03-16 16:59:24 UTC

Jon and I have narrowed this issue down to the write size being used to trigger
the down conversion. 

The following will cause the down conversion to work properly:
dd if=/dev/zero of=/dev/vg/cmirror bs=4M count=1


The follow will not:
dd if=/dev/zero of=/dev/vg/cmirror count=1

Comment 22 Jonathan Earl Brassow 2007-03-16 19:06:56 UTC

I've seen this now.  It depends on the arguments you use (and tools) to do the
"write".

corey has been using 'dd if=/dev/zero of=/dev/vg/lv count=10'.  The dd actually
does a read before writing if the request sizes are small enough.  Looking at
the mirror code for reads:

static void do_reads(struct mirror_set *ms, struct bio_list *reads)
{
        struct bio *bio;
        struct mirror *m;

        while ((bio = bio_list_pop(reads))) {
                /*
                 * We can only read balance if the region is in sync.
                 */
                if (likely(rh_in_sync(&ms->rh,
                                      bio_to_region(&ms->rh, bio),
                                      0)))
                        m = choose_mirror(ms);
                else {
                        m = ms->default_mirror;

                        /* If the default fails, we give up .*/
                        if (unlikely(m && atomic_read(&m->error_count)))
                                m = NULL;
                }

                if (likely(m))
                        read_async_bio(m, bio);
                else
                        bio_endio(bio, bio->bi_size, -EIO);
        }
}

We see that we call rh_in_sync without the ability to block (third argument). 
Cluster mirroring (unlike single machine mirroring) must always block. 
Therefore, it is never allowed to 'choose_mirror'.  Since the primary device has
failed, 'bio_endio(bio, bio->bi_size, -EIO);' is called.  This fails the read,
and the writes from dd never take place... meaning an event never gets triggered
and the mirror does not get down converted.

I believe it is a bug in the kernel that we do not allow blocking in the
rh_in_sync.  Since we do not, reads are never allowed to balance [even when the
mirrors are in-sync].

There should also be a debate about whether or not to trigger an event if a read
fails... I don't think so, but that can be discussed in another bug.

assigned -> modified

Comment 23 Corey Marthaler 2007-03-16 20:09:18 UTC

Marking this bug verified. 

The easy to reproduce cmiror leg failure regression no longer happens. All other
lesser reproducable cmirror failure cases should be documented in other BZs.

Comment 24 Chris Feist 2008-08-05 21:42:36 UTC

Fixed in current release (4.7).

Note You need to log in before you can comment on or make changes to this bug.