Bug 431785

Summary: RHEL5 cmirror tracker: resync can get stuck
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: agk, ccaulfie, dwysocha, heinzm, iannis, mbroz
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-27 15:03:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2008-02-06 22:45:46 UTC
Description of problem:
This may be related to bz 431714. I've been able to reproduce a scenario where a
mirror attempting to resync gets stuck.

lvchange --resync -y mirror_sanity/resync_regular

[root@grant-02 ~]# lvs
  LV             VG            Attr   LSize  Origin Snap%  Move Log            
    Copy%  Convert
  LogVol00       VolGroup00    -wi-ao 72.44G
  LogVol01       VolGroup00    -wi-ao  1.94G
  resync_nosync  mirror_sanity mwi-a-  2.00G                   
resync_nosync_mlog  100.00
  resync_regular mirror_sanity mwi-a-  2.00G                   
resync_regular_mlog  10.55

# It never gets beyond 10.55 percent

# Grant-02:
Feb  6 15:57:41 grant-02 qarshd[11680]: Running cmdline: lvchange -ay
mirror_sanity/resync_nosync
Feb  6 15:57:41 grant-02 clogd[26975]: [zPCqDSfd] Cluster log created
Feb  6 15:57:41 grant-02 clogd[26975]: [zPCqDSfd] Non-master resume: bits pre-loaded
Feb  6 15:57:41 grant-02 lvm[3357]: Monitoring mirror device
mirror_sanity-resync_nosync for events
Feb  6 15:57:41 grant-02 clogd[26975]:   Region already in-sync: 0
Feb  6 15:57:41 grant-02 clogd[26975]:   Region already in-sync: 6
Feb  6 15:57:41 grant-02 xinetd[2789]: EXIT: qarsh status=0 pid=11680
duration=0(sec)
Feb  6 15:57:41 grant-02 clogd[26975]:   Region already in-sync: 7

# Grant-01 (mirror server):
Feb  6 15:57:36 grant-01 lvm[3425]: Monitoring mirror device
mirror_sanity-resync_regular for events
Feb  6 15:57:45 grant-01 lvm[3425]: No longer monitoring mirror device
mirror_sanity-resync_regular for events                                        
                                                             Feb  6 15:57:45
grant-01 clogd[29619]: [eVDBier3] clog_postsuspend: leaving CPG                
          Feb  6 15:57:45 grant-01 clogd[29619]: [eVDBier3] clog_postsuspend:
finalizing                            Feb  6 15:57:45 grant-01 clogd[29619]:
[eVDBier3] Cluster log removed                                     Feb  6
15:57:45 grant-01 lvm[3425]: No longer monitoring mirror device
mirror_sanity-resync_nosync for events                                         
                                                             Feb  6 15:57:45
grant-01 clogd[29619]: [zPCqDSfd] clog_postsuspend: leaving CPG                
          Feb  6 15:57:45 grant-01 clogd[29619]: [zPCqDSfd] clog_postsuspend:
finalizing                            Feb  6 15:57:45 grant-01 clogd[29619]:
[zPCqDSfd] Cluster log removed                                     Feb  6
15:57:49 grant-01 clogd[29619]: [eVDBier3] Cluster log created                 
                   Feb  6 15:57:49 grant-01 clogd[29619]: [eVDBier3] Master
resume: reading disk log                         Feb  6 15:57:49 grant-01
clogd[29619]: Header not valid                                                 
 Feb  6 15:57:49 grant-01 clogd[29619]:   magic     : 0  (expected: 4d695272)  
                           Feb  6 15:57:49 grant-01 clogd[29619]:   version   :
0                                                    Feb  6 15:57:49 grant-01
clogd[29619]:   nr_regions: 0                                                  
 Feb  6 15:57:49 grant-01 clogd[29619]: *** Invalid argument ***               
                           Feb  6 15:57:49 grant-01 [3425]: Monitoring mirror
device mirror_sanity-resync_regular for events         Feb  6 15:57:49 grant-01
clogd[29619]: [zPCqDSfd] Cluster log created                                   
 Feb  6 15:57:49 grant-01 clogd[29619]: [zPCqDSfd] Non-master resume: bits
pre-loaded                      Feb  6 15:57:49 grant-01 lvm[3425]: Monitoring
mirror device mirror_sanity-resync_nosync for events       Feb  6 15:57:49
grant-01 clogd[29619]:   Region already in-sync: 6                             
          Feb  6 15:57:49 grant-01 clogd[29619]:   Region already in-sync: 7   
                                    Feb  6 15:57:51 grant-01 clogd[29619]: []
Unable to send (null) to cluster: Invalid exchange              Feb  6 15:57:51
grant-01 clogd[29619]: Bad callback on local/4                                 
          Feb  6 15:57:56 grant-01 kernel: device-mapper: dm-log-clustered:
Timed out waiting for cluster log server [DM_CLOG_SET_REGION_SYNC]             
                                                                  Feb  6
15:57:56 grant-01 kernel: device-mapper: dm-log-clustered: Retrying request
[DM_CLOG_SET_REGION_SYNC]                                                      
                                                 Feb  6 15:57:56 grant-01
kernel: device-mapper: dm-log-clustered: [102282][0][0][0][0][0] [3]             


Grant-03:
Feb  6 15:57:57 grant-03 lvm[3369]: No longer monitoring mirror device
mirror_sanity-resync_regular for events
Feb  6 15:57:58 grant-03 clogd[1698]: [eVDBier3] clog_postsuspend: leaving CPG
Feb  6 15:57:58 grant-03 clogd[1698]: [eVDBier3] clog_postsuspend: finalizing
Feb  6 15:57:58 grant-03 clogd[1698]: [eVDBier3] Cluster log removed
Feb  6 15:57:58 grant-03 lvm[3369]: No longer monitoring mirror device
mirror_sanity-resync_nosync for events
Feb  6 15:57:58 grant-03 clogd[1698]: [zPCqDSfd] clog_postsuspend: leaving CPG
Feb  6 15:57:58 grant-03 clogd[1698]: [zPCqDSfd] clog_postsuspend: finalizing
Feb  6 15:57:58 grant-03 clogd[1698]: [zPCqDSfd] Cluster log removed
Feb  6 15:58:01 grant-03 clogd[1698]: [eVDBier3] Cluster log created
Feb  6 15:58:01 grant-03 clogd[1698]: [eVDBier3] Non-master resume: bits pre-loaded
Feb  6 15:58:01 grant-03 [3369]: Monitoring mirror device
mirror_sanity-resync_regular for events
Feb  6 15:58:01 grant-03 clogd[1698]: [zPCqDSfd] Cluster log created
Feb  6 15:58:01 grant-03 clogd[1698]: [zPCqDSfd] Master resume: reading disk log
Feb  6 15:58:01 grant-03 clogd[1698]: Header not valid
Feb  6 15:58:01 grant-03 clogd[1698]:   magic     : 0  (expected: 4d695272)
Feb  6 15:58:01 grant-03 clogd[1698]:   version   : 0
Feb  6 15:58:01 grant-03 clogd[1698]:   nr_regions: 0
Feb  6 15:58:01 grant-03 clogd[1698]: *** Invalid argument ***
Feb  6 15:58:01 grant-03 lvm[3369]: Monitoring mirror device
mirror_sanity-resync_nosync for events
Feb  6 15:58:02 grant-03 clogd[1698]:   Region already in-sync: 0
Feb  6 15:58:02 grant-03 clogd[1698]:   Region already in-sync: 6
Feb  6 15:58:02 grant-03 clogd[1698]:   Region already in-sync: 7
Feb  6 15:58:42 grant-03 xinetd[2794]: START: qarsh pid=18158 from=10.15.80.47
Feb  6 15:58:42 grant-03 qarshd[18158]: Talking to peer 10.15.80.47:56305
Feb  6 15:58:42 grant-03 qarshd[18158]: Running cmdline: lvs --noheadings -o
copy_percent mirror_sanity/resync_regular
Feb  6 15:58:42 grant-03 xinetd[2794]: EXIT: qarsh status=0 pid=18158
duration=0(sec)


Version-Release number of selected component (if applicable):
cmirror-1.1.11-1.el5
kmod-cmirror-0.1.5-2.el5
lvm2-2.02.32-1.el5
lvm2-cluster-2.02.32-1.el5

Comment 1 RHEL Program Management 2008-02-06 22:57:09 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Corey Marthaler 2008-08-22 15:41:26 UTC
Mirror resyncing has long been verified to now work.

Comment 5 Alasdair Kergon 2010-04-27 15:03:39 UTC
Assuming this VERIFIED fix got released.  Closing.
Reopen if it's not yet resolved.