Bug 1655863 - Liveliness probe of cirros pod fails with error message and unable to rsh into the pod
Summary: Liveliness probe of cirros pod fails with error message and unable to rsh int...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: CNS-deployment
Version: ocs-3.11
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Michael Adam
QA Contact: Prasanth
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-04 06:30 UTC by Kshithij Iyer
Modified: 2019-02-08 08:28 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-08 08:28:56 UTC
Embargoed:


Attachments (Terms of Use)

Description Kshithij Iyer 2018-12-04 06:30:07 UTC
Description of problem:
On an independent mode OCS 3.11 setup, the liveliness probe fails with the following message:

Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\"" 

Post which the pod keeps on restarting in a loop and and then eventually crashes  with "CrashLoopBackOff" status. 

while I try to rsh into the pod, it displays the following message:
# oc rsh cirrossc1-1-dtpb7 
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

command terminated with exit code 126

Version-Release number of selected component (if applicable):
Gluster version:
3.12.2-25

Heketi version:
7.0.0

How reproducible:
4/4

Steps to Reproduce:
1. Create a storage class.
2. Create a pvc claim on the storage class.
3. Create a cirros pod with pvc claim.
4. Try doing an oc rsh into the pod and do a oc describe. (oc rsh will fail and oc describe will show the following message. )

Actual results:
oc describe pod shows the following error:
 
Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\"" 

oc rsh shows the following error:

rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

command terminated with exit code 126

Expected results:
It should not show any error in oc describe pod and oc rsh should be successful. 

Additional info:
Have tried this scenario with both accessModes "ReadWriteMany" and "ReadWriteOnce" in the pvc.yml still end up with the same results.

Comment 2 Niels de Vos 2019-02-06 18:57:37 UTC
This is unlikely a problem caused by OCS.

Can you 'oc rsh' into a pod that does not have any PVC attached? What about other non-cirros container images?

During development we have seen similar issues with containers that setup a "/dev" volumeMount while CRI-O is used (while it works fine with Docker). Can you share the deployment details that you use to reproduce this problem?

Comment 3 Kshithij Iyer 2019-02-08 07:24:49 UTC
This bug was observed because of the docker version at that particular build. Have not observed it after that. You can close this bug saying "worksforme" or "fixed".

Comment 4 Michael Adam 2019-02-08 08:28:56 UTC
Thanks, closing.


Note You need to log in before you can comment on or make changes to this bug.