Bug 531637

Summary:	core conversion or log allocation doesn't take place when lowest ID node doesn't experience the failure
Product:	Red Hat Enterprise Linux 5	Reporter:	Corey Marthaler <cmarthal>
Component:	Documentation-cluster	Assignee:	Steven J. Levine <slevine>
Status:	CLOSED CURRENTRELEASE	QA Contact:	ecs-bugs
Severity:	high	Docs Contact:
Priority:	high
Version:	5.4	CC:	agk, ccaulfie, coughlan, dwysocha, heinzm, iannis, jbrassow, jha, mbroz, mhideo, prockai
Target Milestone:	rc	Keywords:	Documentation
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	With clustered mirrors, the mirror log management is completely the responsibility of the cluster node with the currently lowest cluster ID. Therefore, when the device holding the cluster mirror log becomes unavailable on a subset of the cluster, the clustered mirror can continue operating without any impact, as long as the cluster node with lowest ID retains access to the mirror log. Since the mirror is undisturbed, no automatic corrective action (repair) is issued, either. When the lowest-ID cluster node loses access to the mirror log, however, automatic action will kick in (regardless of accessibility of the log from other nodes).	Story Points:	---
Clone Of:
Clones:	642400 (view as bug list)		Environment:
Last Closed:	2011-04-14 04:48:57 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	642400, 656090

Description Corey Marthaler 2009-10-28 22:45:09 UTC

Description of problem:
If more than one node's mirror log device fails (but still a subset of the cluster) then the mirror will auto convert to a core log, but if just one node in the cluster has the device fail, it wont. 

This only appears to affect log devices. When primary and secondary leg devices fail on just one node (or any subset of the cluster), the mirror will down convert properly.


# Test output:
Scenario: Kill disk log of synced 2 leg mirror(s)                               

********* Mirror hash info for this scenario *********
* names:             syncd_log_2legs_1                
* sync:              1                                
* disklog:           /dev/sdh1                        
* failpv(s):         /dev/sdh1                        
* failnode(s):       taft-03-bond                     
* leg devices:       /dev/sdg1 /dev/sdf1              
* additional lv:     /dev/sdh1 /dev/sdg1              
******************************************************


Disabling device sdh on taft-03-bond

Attempting I/O to cause mirror down conversion(s) on taft-03-bond
10+0 records in                                                  
10+0 records out                                                 
41943040 bytes (42 MB) copied, 0.104603 seconds, 401 MB/s        
Verifying the down conversion of the failed mirror(s)            
Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s)      
failed device /dev/sdh1 should no longer be in volume on taft-01-bond



Version-Release number of selected component (if applicable):
2.6.18-160.el5

lvm2-2.02.46-10.el5    BUILT: Fri Sep 18 09:38:06 CDT 2009
lvm2-cluster-2.02.46-10.el5    BUILT: Fri Sep 18 09:39:48 CDT 2009
device-mapper-1.02.32-1.el5    BUILT: Thu May 21 02:18:23 CDT 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009


How reproducible:
everytime

Comment 1 Corey Marthaler 2009-12-01 19:27:42 UTC

I just reproduced this by killing the log on 2/4 nodes in the cluster.

Scenario: Kill disk log of non synced 2 leg mirror(s)                           

********* Mirror hash info for this scenario *********
* names:              nonsyncd_log_2legs_1            
* sync:               0                               
* disklog:            /dev/sdh1                       
* failpv(s):          /dev/sdh1                       
* failnode(s):        taft-03 taft-04                 
* leg devices:        /dev/sdf1 /dev/sde1             
* leg fault policy:   remove                          
* log fault policy:   remove                          
******************************************************

Creating mirror(s) on taft-04...
taft-04: lvcreate -m 1 -n nonsyncd_log_2legs_1 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sde1:0-1000 /dev/sdh1:0-150                                                                                                                                       

Continuing on without fully syncd mirrors, currently at...
        ( 3.58% )                                         

Creating gfs on top of mirror(s) on taft-01...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----                              
        ---- taft-02 ----                              
        ---- taft-03 ----                              
        ---- taft-04 ----                              

Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure                         
Verifying files (checkit) on mirror(s) on...                                                        
        ---- taft-01 ----                                                                           
        ---- taft-02 ----                                                                           
        ---- taft-03 ----                                                                           
        ---- taft-04 ----                                                                           

Disabling device sdh on taft-03
Disabling device sdh on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-03
10+0 records in                                             
10+0 records out                                            
41943040 bytes (42 MB) copied, 0.107447 seconds, 390 MB/s   
Verifying the down conversion of the failed mirror(s)       
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 145669664768: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 4096: Input/output error
 [...]
Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s)
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
 [...]
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'tXGiD0-3zwu-VXo7-YtTK-omuL-xIqr-dAKOnE'.
log policy (if failed) is remove: remove
Verifying LOG device /dev/sdh1 is *NOT* in the linear(s)
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 145669554176: Input/output error
 [...]
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'tXGiD0-3zwu-VXo7-YtTK-omuL-xIqr-dAKOnE'.
Verifying LEG device /dev/sdf1 *IS* in the volume(s)
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  [...]
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'tXGiD0-3zwu-VXo7-YtTK-omuL-xIqr-dAKOnE'.
Verifying LEG device /dev/sde1 *IS* in the volume(s)
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  [...]
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'tXGiD0-3zwu-VXo7-YtTK-omuL-xIqr-dAKOnE'.
Verify the dm devices associated with /dev/sdh1 are no longer present
nonsyncd_log_2legs_1_mlog on taft-01 should no longer be there
FI_engine: recover() method failed

Comment 2 Corey Marthaler 2009-12-01 19:44:06 UTC

Here's the mirror view on each node after the partial failure:


[root@taft-01 sts-rhel5.4]# lvs -a -o +devices
  LV                              VG             Attr   LSize   Origin Snap%  Move Log                       Copy%  Convert Devices                          
  nonsyncd_log_2legs_1            helter_skelter mwi-ao 600.00M                    nonsyncd_log_2legs_1_mlog 100.00         nonsyncd_log_2legs_1_mimage_0(0),nonsyncd_log_2legs_1_mimage_1(0)                                                                                                                             
  [nonsyncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 600.00M                                                             /dev/sdf1(0)                     
  [nonsyncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                                             /dev/sde1(0)                     
  [nonsyncd_log_2legs_1_mlog]     helter_skelter lwi-ao   4.00M                                                             /dev/sdh1(0)                     

[root@taft-02 sts-rhel5.4]# lvs -a -o +devices
  LV                              VG             Attr   LSize   Origin Snap%  Move Log                       Copy%  Convert Devices                  
  nonsyncd_log_2legs_1            helter_skelter mwi-ao 600.00M                    nonsyncd_log_2legs_1_mlog 100.00         nonsyncd_log_2legs_1_mimage_0(0),nonsyncd_log_2legs_1_mimage_1(0)                                                                                                                             
  [nonsyncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 600.00M                                                             /dev/sdf1(0)                     
  [nonsyncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                                             /dev/sde1(0)                     
  [nonsyncd_log_2legs_1_mlog]     helter_skelter lwi-ao   4.00M                                                             /dev/sdh1(0)                     

[root@taft-03 sts-rhel5.4]# lvs -a -o +devices
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error                                           
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 4128768: Input/output error
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 4186112: Input/output error
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error      
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 4096: Input/output error   
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error      
  /dev/sdh1: read failed after 0 of 512 at 145669554176: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 145669664768: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'tXGiD0-3zwu-VXo7-YtTK-omuL-xIqr-dAKOnE'.
 [...]
  LV                              VG             Attr   LSize   Origin Snap%  Move Log                       Copy%  Convert Devices
  nonsyncd_log_2legs_1            helter_skelter mwi-ao 600.00M                    nonsyncd_log_2legs_1_mlog 100.00         nonsyncd_log_2legs_1_mimage_0(0),nonsyncd_log_2legs_1_mimage_1(0)
  [nonsyncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 600.00M                                                             /dev/sdf1(0)
  [nonsyncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                                             /dev/sde1(0)
  [nonsyncd_log_2legs_1_mlog]     helter_skelter lwi-ao   4.00M                                                             unknown device(0)

[root@taft-04 sts-rhel5.4]# lvs -a -o +devices
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 4128768: Input/output error
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 4186112: Input/output error
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 4096: Input/output error
  /dev/mapper/helter_skelter-nonsyncd_log_2legs_1_mlog: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 145669554176: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 145669664768: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid 'tXGiD0-3zwu-VXo7-YtTK-omuL-xIqr-dAKOnE'.
 [...]
  LV                              VG             Attr   LSize   Origin Snap%  Move Log                       Copy%  Convert Devices
  nonsyncd_log_2legs_1            helter_skelter mwi-ao 600.00M                    nonsyncd_log_2legs_1_mlog 100.00         nonsyncd_log_2legs_1_mimage_0(0),nonsyncd_log_2legs_1_mimage_1(0)
  [nonsyncd_log_2legs_1_mimage_0] helter_skelter iwi-ao 600.00M                                                             /dev/sdf1(0)
  [nonsyncd_log_2legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                                             /dev/sde1(0)
  [nonsyncd_log_2legs_1_mlog]     helter_skelter lwi-ao   4.00M                                                             unknown device(0)

Comment 4 Corey Marthaler 2010-02-02 00:40:49 UTC

This bug exists when 3/4 nodes have the log device fail.

[...]
Disabling device sdf on taft-02
Disabling device sdf on taft-04
Disabling device sdf on taft-03

Attempting I/O to cause mirror down conversion(s) on taft-02
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.105072 seconds, 399 MB/s
Verifying current sanity of lvm after the failure
Verifying FAILED device /dev/sdf1 is *NOT* in the volume(s)
failed device /dev/sdf1 should no longer be in volume on taft-01

log on taft-01:
[nonsyncd_log_2legs_1_mlog] helter_skelter lwi-ao 4.00M /dev/sdf1(0)

log on the other tafts taft-[234]:
[nonsyncd_log_2legs_1_mlog] helter_skelter lwi-ao 4.00M unknown device(0)

Comment 5 Corey Marthaler 2010-02-02 00:43:39 UTC

In comment #4, the fault policy was allocate, so a new log should have appeared.

Comment 6 Petr Rockai 2010-05-19 14:49:33 UTC

Corey, as far as I understand clustered mirrors, the log is only written by the cluster node with lowest ID. Presumably, what happens is that you are failing the log on cluster nodes that are not writing to the log, so the conversion does not happen.

I think this behaviour is correct. It would be helpful if you could verify that this is the case, though: if a log device is failed on the lowest-id node, it should be replaced or removed as dictated by policy. If it is only failed on nodes different from the lowest-id one, nothing should happen. The clustered mirror should function properly as long as the log is available on that one node.

If you can confirm this is the case, we can close this bug. It may be necessary to update documentation to clarify this?

Comment 7 Corey Marthaler 2010-07-15 22:40:51 UTC

Petr, that does appear to be the case. If I fail the log on only the lowest ID node, everything appears to be repaired properly. However if the lowest ID node isn't failed, then the repair doesn't happen.

If this isn't going to be fixed then we should update the docs with this issue.

Comment 8 Petr Rockai 2010-07-16 10:11:23 UTC

Corey, in that case, yes, we need to update docs. I should point out that the cmirror will keep functioning properly -- only the lowest-id cluster node actually needs the log to be accessible.

To change this behaviour, a considerable redesign of cmirror would be necessary, which I don't think is feasible, nor useful: the current behaviour in this regard is, in my opinion, very reasonable.

Comment 9 Petr Rockai 2010-07-16 10:56:07 UTC

Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
With clustered mirrors, the mirror log management is completely the responsibility of the cluster node with the currently lowest cluster ID. Therefore, when the device holding the cluster mirror log becomes unavailable on a subset of the cluster, the clustered mirror can continue operating without any impact, as long as the cluster node with lowest ID retains access to the mirror log. Since the mirror is undisturbed, no automatic corrective action (repair) is issued, either. When the lowest-ID cluster node loses access to the mirror log, however, automatic action will  kick in (regardless of accessibility of the log from other nodes).

Comment 10 Tom Coughlan 2010-10-05 14:13:50 UTC

This BZ only requires a 5.6. Tech Note. No patch. I believe Ryan will see the flag and add the Tech. Note, independant of the state of the BZ.

This text should also go into whichever manual covers this area. I am reassigning to Documentation-cluster. I assume this is needed for both RHEL 5 and 6. documentation.

Peter, do you think we need a 6.1 Tech. Note as well?

Comment 11 Petr Rockai 2010-10-05 14:47:55 UTC

I believe manual update is good enough for 6.1, a separate technical note is probably not needed.

Comment 12 Steven J. Levine 2010-12-02 19:01:36 UTC

Petr:

I am updating the LVM docs (in RHEL 5.6. and 6.1) with this info and trying to determine where to put the information.

When you say that "automatic action will kick in", do you mean the action specified by the mirror_log_fault_policy parameter in the configuration file?

-Steven

Comment 13 Steven J. Levine 2010-12-15 22:12:46 UTC

I have added the paragraph in Comment 9 to the draft of the RHEL 5.6 document. This will be available when RHEL 5.6 is released.

Comment 14 Steven J. Levine 2010-12-21 21:10:17 UTC

The 5.6 document is complete and checked in for 5.6. At the 5.6 release this note will be included in the document.