Bug 675306

Summary: "Mangled reply" during service relocation
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: cluster-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-07 19:37:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Corey Marthaler 2011-02-04 20:50:47 UTC
Description of problem:
I saw this after 26 iterations of HA service relocation. There was no nfs client I/O taking place during these iterations.

================================================================================
Iteration 26 started at Fri Feb  4 12:54:30 CST 2011
Verifying that all services are started on all nodes in cluster
Sleeping 1 minute(s) in between each relocation...
Relocating nfs1 from grant-01 to grant-03

service:nfs1 owner should be [grant-03], not [none].
service:nfs1 is stuck in the [stopped] state.
Failed relocation attempt

[root@grant-01 ~]# clustat
Cluster Status for GRANT @ Fri Feb  4 14:26:10 2011
Member Status: Quorate

 Member Name  ID   Status
 ------ ----  ---- ------
 grant-01        1 Online, Local, rgmanager
 grant-02        2 Online, rgmanager
 grant-03        3 Online, rgmanager

 Service Name   Owner (Last)  State
 ------- ----   ----- ------  -----
 service:nfs1   (grant-01)    stopped


[root@grant-03 ~]# clustat
Cluster Status for GRANT @ Fri Feb  4 14:27:03 2011
Member Status: Quorate

 Member Name  ID   Status
 ------ ----  ---- ------
 grant-01        1 Online, rgmanager
 grant-02        2 Online, rgmanager
 grant-03        3 Online, Local, rgmanager

 Service Name   Owner (Last)  State
 ------- ----   ----- ------  -----
 service:nfs1   (grant-01)    stopped

Feb  4 12:55:23 grant-01 qarshd[21754]: Running cmdline: clusvcadm -r nfs1 -m grant-03
Feb  4 12:55:23 grant-01 rgmanager[2162]: Stopping service service:nfs1
Feb  4 12:55:23 grant-01 rgmanager[21788]: Removing IPv4 address 10.15.89.208/24 from eth0
Feb  4 12:55:33 grant-01 rgmanager[21831]: Removing export: *:/mnt/grant1
Feb  4 12:55:33 grant-01 rgmanager[21863]: Stopping NFS daemons
Feb  4 12:55:33 grant-01 mountd[20550]: Caught signal 15, un-registering and exiting.
Feb  4 12:55:33 grant-01 kernel: nfsd: last server has exited, flushing export cache
Feb  4 12:55:34 grant-01 rgmanager[21967]: Stopping rpc.statd
Feb  4 12:55:35 grant-01 rgmanager[22100]: unmounting /mnt/grant1
Feb  4 12:55:35 grant-01 rgmanager[2162]: Service service:nfs1 is stopped
Feb  4 12:55:40 grant-01 rgmanager[2162]: #60: Mangled reply from member #3 during RG relocate


Feb  4 12:56:32 grant-03 rgmanager[2156]: #37: Error receiving header from 1 sz=0 CTX 0x8acbf0


Version-Release number of selected component (if applicable):
2.6.32-94.el6.x86_64
rgmanager-3.0.12-10.el6.x86_64

Comment 1 Lon Hohberger 2011-02-07 19:37:40 UTC

*** This bug has been marked as a duplicate of bug 635152 ***