Bug 1589785

Summary:	Heketi reports "Failed to get list of pods" , periodic health checks stopped and all POST requests(volume create) are hung at heketi end
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Neha Berry <nberry>
Component:	heketi	Assignee:	Michael Adam <madam>
Status:	CLOSED WONTFIX	QA Contact:	Neha Berry <nberry>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	cns-3.9	CC:	hchiramm, jmulligan, kramdoss, rhs-bugs, rtalur, sankarshan, storage-qa-internal
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-02-07 22:21:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1641915

Comment 7 Raghavendra Talur 2018-06-28 13:00:55 UTC

It does look like issue with kubeexec.

"Failed to get list of pods" error is seen when master node does not reply to heketi's query of which are the gluster nodes. This means either master is not responding or the communication path from heketi pod to master is broken.

How to debug
1. check if oc commands are working
2. Log on to heketi pod and use heketi-cli to get details which does not require the kubeexec path. Like heketi-cli volume list. If that works, heketi is responding.

At this point it is verified that heketi is not the culprit. But we don't know if master or the communication path is the problem. I don't have any way to debug after this point.