Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 238629

Summary: dm-cmirror: Remote recovery conflict: (3337311 >= 24607)/K8rNJHz9
Product: [Retired] Red Hat Cluster Suite Reporter: Nate Straz <nstraz>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: agk, cmarthal, dwysocha, edamato, mbroz, prockai
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: 4.6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-30 18:18:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
complete dmseg output from node basic. none

Description Nate Straz 2007-05-01 21:20:57 UTC
Description of problem:

While running I/O on one node on a cluster mirror my I/O load hangs.
Upon closer inspection I found that all nodes complained.  Here is the tail
of each node's dmesg.  I will attach the complete output from basic.

[nstraz@try 3]$ tail -n 12 *.dmesg
==> basic.dmesg <==
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9

==> doral.dmesg <==
cdrom: open failed.
cdrom: open failed.
dm-cmirror: Creating K8rNJHz9 (1)
dm-cmirror: start_server called
dm-cmirror: cluster_log_serverd ready for work
dm-cmirror: Node joining
dm-cmirror: server_id=dead, server_valid=0, K8rNJHz9
dm-cmirror: trigger = LRT_GET_RESYNC_WORK
dm-cmirror: LRT_ELECTION(10): (K8rNJHz9)
dm-cmirror:   starter     : 4
dm-cmirror:   co-ordinator: 4
dm-cmirror:   node_count  : 0

==> kent.dmesg <==
dm-cmirror: Creating K8rNJHz9 (1)
dm-cmirror: start_server called
dm-cmirror: cluster_log_serverd ready for work
dm-cmirror: Node joining
dm-cmirror: server_id=dead, server_valid=0, K8rNJHz9
dm-cmirror: trigger = LRT_GET_SYNC_COUNT
dm-cmirror: LRT_ELECTION(10): (K8rNJHz9)
dm-cmirror:   starter     : 2
dm-cmirror:   co-ordinator: 2
dm-cmirror:   node_count  : 0
dm-cmirror: Node joining
dm-cmirror: Node joining

==> newport.dmesg <==
cdrom: open failed.
dm-cmirror: Creating K8rNJHz9 (1)
dm-cmirror: start_server called
dm-cmirror: cluster_log_serverd ready for work
dm-cmirror: Node joining
dm-cmirror: server_id=dead, server_valid=0, K8rNJHz9
dm-cmirror: trigger = LRT_GET_RESYNC_WORK
dm-cmirror: LRT_ELECTION(10): (K8rNJHz9)
dm-cmirror:   starter     : 3
dm-cmirror:   co-ordinator: 3
dm-cmirror:   node_count  : 0
dm-cmirror: Node joining


Version-Release number of selected component (if applicable):
lvm2-cluster-2.02.21-7.el4.ppc64
lvm2-2.02.21-5.el4.ppc
cmirror-1.0.1-1.ppc64
device-mapper-1.02.17-3.el4.ppc
device-mapper-1.02.17-3.el4.ppc64


How reproducible:
I've hit it twice already.  It should be easy to hit again on the same hardware.


Steps to Reproduce:
lvm_try, mirror_2 volume config
  
Actual results:
See above

Expected results:


Additional info:

Comment 1 Nate Straz 2007-05-01 21:20:57 UTC
Created attachment 153896 [details]
complete dmseg output from node basic.

Comment 2 Jonathan Earl Brassow 2007-05-02 14:18:20 UTC
Writes are disallowed to regions that have not yet been recovered.  This makes I/O suck, I know.  Kernel 
changes are required to fix this problem without delaying I.O.  This will be done in 4.6.

For now, writes simply get delayed until the mirror has synced past the region being attempted.

Comment 3 Nate Straz 2007-05-03 16:07:24 UTC
Keep this bug open for a 4.6 errata.

Comment 4 Jonathan Earl Brassow 2007-07-11 16:19:09 UTC
2.6.9-55.16.ELsmp kernel has the necessary patches.

cmirror-kernel code updated July/11/2007


Comment 5 Jonathan Earl Brassow 2007-08-22 17:28:30 UTC
*** Bug 243773 has been marked as a duplicate of this bug. ***

Comment 6 Jonathan Earl Brassow 2007-08-22 17:34:56 UTC
post -> modified.

Comment 7 Jonathan Earl Brassow 2007-09-04 16:12:36 UTC
*** Bug 252007 has been marked as a duplicate of this bug. ***

Comment 8 Corey Marthaler 2007-09-25 13:50:21 UTC
This bug may not be completely fixed yet and may be related to currently open bz
290821.

Comment 9 Nate Straz 2008-05-30 18:18:33 UTC
Closing this out since it missed the errata process.