Bug 1589785

Summary: Heketi reports "Failed to get list of pods" , periodic health checks stopped and all POST requests(volume create) are hung at heketi end
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Neha Berry <nberry>
Component: heketiAssignee: Michael Adam <madam>
Status: CLOSED WONTFIX QA Contact: Neha Berry <nberry>
Severity: medium Docs Contact:
Priority: unspecified    
Version: cns-3.9CC: hchiramm, jmulligan, kramdoss, rhs-bugs, rtalur, sankarshan, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-07 22:21:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1641915    

Comment 7 Raghavendra Talur 2018-06-28 13:00:55 UTC
It does look like issue with kubeexec.

"Failed to get list of pods" error is seen when master node does not reply to heketi's query of which are the gluster nodes. This means either master is not responding or the communication path from heketi pod to master is broken.

How to debug
1. check if oc commands are working
2. Log on to heketi pod and use heketi-cli to get details which does not require the kubeexec path. Like heketi-cli volume list. If that works, heketi is responding.

At this point it is verified that heketi is not the culprit. But we don't know if master or the communication path is the problem. I don't have any way to debug after this point.