Bug 1299858
Summary: | While running ganesha ha cases IO hanged, VIPs got lost | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Apeksha <akhakhar> |
Component: | gluster-nfs | Assignee: | Kaleb KEITHLEY <kkeithle> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Saurabh <saujain> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rhgs-3.1 | CC: | jthottan, kkeithle, ndevos, nlevinki, rhs-bugs, skoduri, storage-qa-internal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 3.1.3 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-06 11:14:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Apeksha
2016-01-19 12:29:50 UTC
While debugging we haven't seen any errors logged on why VIPs have got lost. As Apeksha mentioned, pacemaker service restart bought the services back. Since it is hit only once, keep this bug at low priority for now and may be it will be good to document it as known issue. Hit this issue again on the new build - glusterfs-3.7.5-17.el7rhgs.x86_64. We have seen below logs when the issue happened - Jan 21 22:26:57 vm4 crmd[28274]: notice: Operation vm1-cluster_ip-1_start_0: ok (node=vm4, call=60, rc=0, cib-update=37, confirmed=true) Jan 21 22:26:57 vm4 crmd[28274]: notice: Operation vm4-cluster_ip-1_start_0: ok (node=vm4, call=61, rc=0, cib-update=38, confirmed=true) Jan 21 22:27:17 vm4 lrmd[28271]: warning: nfs-grace_monitor_5000 process (PID 30924) timed out Jan 21 22:27:17 vm4 lrmd[28271]: warning: nfs-grace_monitor_5000:30924 - timed out after 20000ms Jan 21 22:27:17 vm4 crmd[28274]: error: Operation nfs-grace_monitor_5000: Timed Out (node=vm4, call=59, timeout=20000ms) Jan 21 22:27:17 vm4 IPaddr(vm4-cluster_ip-1)[31319]: INFO: IP status = ok, IP_CIP= Jan 21 22:27:17 vm4 IPaddr(vm1-cluster_ip-1)[31318]: INFO: IP status = ok, IP_CIP= Jan 21 22:27:17 vm4 crmd[28274]: notice: Operation vm4-cluster_ip-1_stop_0: ok (node=vm4, call=67, rc=0, cib-update=42, confirmed=true) Jan 21 22:27:17 vm4 crmd[28274]: notice: Operation vm1-cluster_ip-1_stop_0: ok (node=vm4, call=65, rc=0, cib-update=43, confirmed=true) Maybe stopping the cluster_ip resources went into stopped state on all the nodes resulting in loss of VIPs. Not sure what could have triggered that. Sorry for the type above. What I meant is that 'all the cluster_ip resources being in stopped state has resulted in VIPs getting lost.' One of the reasons could be that ganesha_active attribute may have got unset. Might have been fixed with the recent changes that Kaleb posted. Kaleb, can you point us to a possible downstream patch? QE can then test again with a version that contains a fix. closing, has not been seen during 3.1.3 testing |