Bug 238629 - dm-cmirror: Remote recovery conflict: (3337311 >= 24607)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24607)/K8rNJHz9
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror (Show other bugs)
4
ppc64 Linux
medium Severity medium
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
: Reopened
: 243773 252007 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-01 17:20 EDT by Nate Straz
Modified: 2010-01-11 21:03 EST (History)
6 users (show)

See Also:
Fixed In Version: 4.6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-30 14:18:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
complete dmseg output from node basic. (122.38 KB, text/plain)
2007-05-01 17:20 EDT, Nate Straz
no flags Details

  None (edit)
Description Nate Straz 2007-05-01 17:20:57 EDT
Description of problem:

While running I/O on one node on a cluster mirror my I/O load hangs.
Upon closer inspection I found that all nodes complained.  Here is the tail
of each node's dmesg.  I will attach the complete output from basic.

[nstraz@try 3]$ tail -n 12 *.dmesg
==> basic.dmesg <==
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9
dm-cmirror: Remote recovery conflict: (3337311 >= 24639)/K8rNJHz9

==> doral.dmesg <==
cdrom: open failed.
cdrom: open failed.
dm-cmirror: Creating K8rNJHz9 (1)
dm-cmirror: start_server called
dm-cmirror: cluster_log_serverd ready for work
dm-cmirror: Node joining
dm-cmirror: server_id=dead, server_valid=0, K8rNJHz9
dm-cmirror: trigger = LRT_GET_RESYNC_WORK
dm-cmirror: LRT_ELECTION(10): (K8rNJHz9)
dm-cmirror:   starter     : 4
dm-cmirror:   co-ordinator: 4
dm-cmirror:   node_count  : 0

==> kent.dmesg <==
dm-cmirror: Creating K8rNJHz9 (1)
dm-cmirror: start_server called
dm-cmirror: cluster_log_serverd ready for work
dm-cmirror: Node joining
dm-cmirror: server_id=dead, server_valid=0, K8rNJHz9
dm-cmirror: trigger = LRT_GET_SYNC_COUNT
dm-cmirror: LRT_ELECTION(10): (K8rNJHz9)
dm-cmirror:   starter     : 2
dm-cmirror:   co-ordinator: 2
dm-cmirror:   node_count  : 0
dm-cmirror: Node joining
dm-cmirror: Node joining

==> newport.dmesg <==
cdrom: open failed.
dm-cmirror: Creating K8rNJHz9 (1)
dm-cmirror: start_server called
dm-cmirror: cluster_log_serverd ready for work
dm-cmirror: Node joining
dm-cmirror: server_id=dead, server_valid=0, K8rNJHz9
dm-cmirror: trigger = LRT_GET_RESYNC_WORK
dm-cmirror: LRT_ELECTION(10): (K8rNJHz9)
dm-cmirror:   starter     : 3
dm-cmirror:   co-ordinator: 3
dm-cmirror:   node_count  : 0
dm-cmirror: Node joining


Version-Release number of selected component (if applicable):
lvm2-cluster-2.02.21-7.el4.ppc64
lvm2-2.02.21-5.el4.ppc
cmirror-1.0.1-1.ppc64
device-mapper-1.02.17-3.el4.ppc
device-mapper-1.02.17-3.el4.ppc64


How reproducible:
I've hit it twice already.  It should be easy to hit again on the same hardware.


Steps to Reproduce:
lvm_try, mirror_2 volume config
  
Actual results:
See above

Expected results:


Additional info:
Comment 1 Nate Straz 2007-05-01 17:20:57 EDT
Created attachment 153896 [details]
complete dmseg output from node basic.
Comment 2 Jonathan Earl Brassow 2007-05-02 10:18:20 EDT
Writes are disallowed to regions that have not yet been recovered.  This makes I/O suck, I know.  Kernel 
changes are required to fix this problem without delaying I.O.  This will be done in 4.6.

For now, writes simply get delayed until the mirror has synced past the region being attempted.
Comment 3 Nate Straz 2007-05-03 12:07:24 EDT
Keep this bug open for a 4.6 errata.
Comment 4 Jonathan Earl Brassow 2007-07-11 12:19:09 EDT
2.6.9-55.16.ELsmp kernel has the necessary patches.

cmirror-kernel code updated July/11/2007
Comment 5 Jonathan Earl Brassow 2007-08-22 13:28:30 EDT
*** Bug 243773 has been marked as a duplicate of this bug. ***
Comment 6 Jonathan Earl Brassow 2007-08-22 13:34:56 EDT
post -> modified.
Comment 7 Jonathan Earl Brassow 2007-09-04 12:12:36 EDT
*** Bug 252007 has been marked as a duplicate of this bug. ***
Comment 8 Corey Marthaler 2007-09-25 09:50:21 EDT
This bug may not be completely fixed yet and may be related to currently open bz
290821.
Comment 9 Nate Straz 2008-05-30 14:18:33 EDT
Closing this out since it missed the errata process.

Note You need to log in before you can comment on or make changes to this bug.