Bug 431621

Summary: RHEL5 cmirror tracker: simultaneous creation can cause sync to get stuck
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: agk, ccaulfie, dwysocha, heinzm, iannis, mbroz
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-27 15:05:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 430797    

Description Corey Marthaler 2008-02-05 22:54:37 UTC
Description of problem:
I created a cmirror from all 4 nodes in the cluster at the same time. One of the
nodes got stuck while attempting to sync.

[root@taft-01 ~]# lvcreate -m 1 -n $(hostname) -L 2G taft
  Logical volume "taft-01" created

[root@taft-02 ~]# lvcreate -m 1 -n $(hostname) -L 2G taft
  Logical volume "taft-02" created

[root@taft-03 ~]# lvcreate -m 1 -n $(hostname) -L 2G taft
  Logical volume "taft-03" created

[root@taft-04 ~]# lvcreate -m 1 -n $(hostname) -L 2G taft
  Logical volume "taft-04" created


[root@taft-01 ~]# lvs -a -o +devices
  LV                 VG         Attr   LSize  Origin Snap%  Move Log         
Copy%  Convert Devices
  taft-01            taft       mwi-a-  2.00G                    taft-01_mlog 
36.33         taft-01_mimage_0(0),taft-01_mimage_)
  [taft-01_mimage_0] taft       Iwi-ao  2.00G                                  
             /dev/sdb1(0)
  [taft-01_mimage_1] taft       Iwi-ao  2.00G                                  
             /dev/sdc1(0)
  [taft-01_mlog]     taft       lwi-ao  4.00M                                  
             /dev/sdh1(0)

  taft-02            taft       mwi-a-  2.00G                    taft-02_mlog
100.00         taft-02_mimage_0(0),taft-02_mimage_)
  [taft-02_mimage_0] taft       iwi-ao  2.00G                                  
             /dev/sdh1(1)
  [taft-02_mimage_1] taft       iwi-ao  2.00G                                  
             /dev/sdb1(512)
  [taft-02_mlog]     taft       lwi-ao  4.00M                                  
             /dev/sdc1(514)

  taft-03            taft       mwi-a-  2.00G                    taft-03_mlog
100.00         taft-03_mimage_0(0),taft-03_mimage_)
  [taft-03_mimage_0] taft       iwi-ao  2.00G                                  
             /dev/sdf1(0)
  [taft-03_mimage_1] taft       iwi-ao  2.00G                                  
             /dev/sdg1(0)
  [taft-03_mlog]     taft       lwi-ao  4.00M                                  
             /dev/sdc1(513)

  taft-04            taft       mwi-a-  2.00G                    taft-04_mlog
100.00         taft-04_mimage_0(0),taft-04_mimage_)
  [taft-04_mimage_0] taft       iwi-ao  2.00G                                  
             /dev/sdd1(0)
  [taft-04_mimage_1] taft       iwi-ao  2.00G                                  
             /dev/sde1(0)
  [taft-04_mlog]     taft       lwi-ao  4.00M                                  
             /dev/sdc1(512)


Feb  5 16:39:34 taft-01 clogd[7471]: [peFjT2Ho] Cluster log created
Feb  5 16:39:34 taft-01 clogd[7471]: [peFjT2Ho] Unable to send
DM_CLOG_GET_SYNC_COUNT to cluster: Invalid exchange
Feb  5 16:39:34 taft-01 clogd[7471]: Bad callback on local/4
Feb  5 16:39:34 taft-01 kernel: device-mapper: dm-log-clustered: Server error
while processing request [DM_CLOG_GET_SYNC_COUNT]: -52
Feb  5 16:39:34 taft-01 clogd[7471]: Setting my cluster id: 1
Feb  5 16:39:34 taft-01 clogd[7471]: [peFjT2Ho] Non-master resume: bits pre-loaded
Feb  5 16:39:34 taft-01 [6781]: Monitoring mirror device taft-taft--01 for events
Feb  5 16:39:43 taft-01 clogd[7471]: [JNb0cmUL] Cluster log created
Feb  5 16:39:43 taft-01 clogd[7471]: [JNb0cmUL] Unable to send
DM_CLOG_GET_SYNC_COUNT to cluster: Invalid exchange
Feb  5 16:39:43 taft-01 clogd[7471]: Bad callback on local/4
Feb  5 16:39:43 taft-01 kernel: device-mapper: dm-log-clustered: Server error
while processing request [DM_CLOG_GET_SYNC_COUNT]: -52
Feb  5 16:39:43 taft-01 clogd[7471]: [JNb0cmUL] Master resume: reading disk log
Feb  5 16:39:43 taft-01 lvm[6781]: Monitoring mirror device taft-taft--04 for events
Feb  5 16:39:46 taft-01 clogd[7471]: [DOxLidpf] Cluster log created
Feb  5 16:39:46 taft-01 clogd[7471]: [DOxLidpf] Unable to send
DM_CLOG_GET_SYNC_COUNT to cluster: Invalid exchange
Feb  5 16:39:46 taft-01 clogd[7471]: Bad callback on local/4
Feb  5 16:39:46 taft-01 clogd[7471]: [DOxLidpf] Master resume: reading disk log
Feb  5 16:39:46 taft-01 kernel: device-mapper: dm-log-clustered: Server error
while processing request [DM_CLOG_GET_SYNC_COUNT]: -52
Feb  5 16:39:46 taft-01 lvm[6781]: Monitoring mirror device taft-taft--03 for events
Feb  5 16:39:50 taft-01 clogd[7471]: [Rk0MI94T] Cluster log created
Feb  5 16:39:50 taft-01 clogd[7471]: [Rk0MI94T] Unable to send
DM_CLOG_GET_SYNC_COUNT to cluster: Invalid exchange
Feb  5 16:39:50 taft-01 clogd[7471]: Bad callback on local/4
Feb  5 16:39:50 taft-01 clogd[7471]: [Rk0MI94T] Master resume: reading disk log
Feb  5 16:39:50 taft-01 kernel: device-mapper: dm-log-clustered: Server error
while processing request [DM_CLOG_GET_SYNC_COUNT]: -52
Feb  5 16:39:50 taft-01 lvm[6781]: Monitoring mirror device taft-taft--02 for events


Version-Release number of selected component (if applicable):
lvm2-2.02.32-1.el5
lvm2-cluster-2.02.32-1.el5
cmirror-1.1.9-1.el5
kmod-cmirror-0.1.5-2.el5

Comment 1 Corey Marthaler 2008-02-05 22:59:18 UTC
Deactivating and then reactivating the volume group got the mirror to finally
sync properly.

Comment 2 RHEL Program Management 2008-02-05 23:07:51 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Jonathan Earl Brassow 2008-02-13 18:31:12 UTC
If you can recreate this, it will show more info when attempting to send the
request to the cluster.... would be nice to have.


Comment 5 Corey Marthaler 2008-02-29 15:27:40 UTC
I haven't been able to reproduce this lately, marking verified.
cmirror-1.1.15-1.el5
kmod-cmirror-0.1.8-1.el5

Comment 7 Alasdair Kergon 2010-04-27 15:05:21 UTC
Assuming this VERIFIED fix got released.  Closing.
Reopen if it's not yet resolved.