Bug 1655863

Summary: Liveliness probe of cirros pod fails with error message and unable to rsh into the pod
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Kshithij Iyer <kiyer>
Component: CNS-deploymentAssignee: Michael Adam <madam>
Status: CLOSED WORKSFORME QA Contact: Prasanth <pprakash>
Severity: medium Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: akhakhar, hchiramm, jarrpa, kiyer, kramdoss, madam, pprakash, rhs-bugs, rtalur, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-08 08:28:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kshithij Iyer 2018-12-04 06:30:07 UTC
Description of problem:
On an independent mode OCS 3.11 setup, the liveliness probe fails with the following message:

Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\"" 

Post which the pod keeps on restarting in a loop and and then eventually crashes  with "CrashLoopBackOff" status. 

while I try to rsh into the pod, it displays the following message:
# oc rsh cirrossc1-1-dtpb7 
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

command terminated with exit code 126

Version-Release number of selected component (if applicable):
Gluster version:
3.12.2-25

Heketi version:
7.0.0

How reproducible:
4/4

Steps to Reproduce:
1. Create a storage class.
2. Create a pvc claim on the storage class.
3. Create a cirros pod with pvc claim.
4. Try doing an oc rsh into the pod and do a oc describe. (oc rsh will fail and oc describe will show the following message. )

Actual results:
oc describe pod shows the following error:
 
Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\"" 

oc rsh shows the following error:

rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

command terminated with exit code 126

Expected results:
It should not show any error in oc describe pod and oc rsh should be successful. 

Additional info:
Have tried this scenario with both accessModes "ReadWriteMany" and "ReadWriteOnce" in the pvc.yml still end up with the same results.

Comment 2 Niels de Vos 2019-02-06 18:57:37 UTC
This is unlikely a problem caused by OCS.

Can you 'oc rsh' into a pod that does not have any PVC attached? What about other non-cirros container images?

During development we have seen similar issues with containers that setup a "/dev" volumeMount while CRI-O is used (while it works fine with Docker). Can you share the deployment details that you use to reproduce this problem?

Comment 3 Kshithij Iyer 2019-02-08 07:24:49 UTC
This bug was observed because of the docker version at that particular build. Have not observed it after that. You can close this bug saying "worksforme" or "fixed".

Comment 4 Michael Adam 2019-02-08 08:28:56 UTC
Thanks, closing.