1655863 – Liveliness probe of cirros pod fails with error message and unable to rsh into the pod

Bug 1655863 - Liveliness probe of cirros pod fails with error message and unable to rsh into the pod

Summary: Liveliness probe of cirros pod fails with error message and unable to rsh int...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	CNS-deployment
Sub Component:
Version:	ocs-3.11
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Michael Adam
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-04 06:30 UTC by Kshithij Iyer
Modified:	2019-02-08 08:28 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-08 08:28:56 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kshithij Iyer 2018-12-04 06:30:07 UTC

Description of problem:
On an independent mode OCS 3.11 setup, the liveliness probe fails with the following message:

Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

Post which the pod keeps on restarting in a loop and and then eventually crashes with "CrashLoopBackOff" status.

while I try to rsh into the pod, it displays the following message:
# oc rsh cirrossc1-1-dtpb7
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

command terminated with exit code 126

Version-Release number of selected component (if applicable):
Gluster version:
3.12.2-25

Heketi version:
7.0.0

How reproducible:
4/4

Steps to Reproduce:
1. Create a storage class.
2. Create a pvc claim on the storage class.
3. Create a cirros pod with pvc claim.
4. Try doing an oc rsh into the pod and do a oc describe. (oc rsh will fail and oc describe will show the following message. )

Actual results:
oc describe pod shows the following error:

oc rsh shows the following error:

rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

command terminated with exit code 126

Expected results:
It should not show any error in oc describe pod and oc rsh should be successful.

Additional info:
Have tried this scenario with both accessModes "ReadWriteMany" and "ReadWriteOnce" in the pvc.yml still end up with the same results.

Comment 2 Niels de Vos 2019-02-06 18:57:37 UTC

This is unlikely a problem caused by OCS.

Can you 'oc rsh' into a pod that does not have any PVC attached? What about other non-cirros container images?

During development we have seen similar issues with containers that setup a "/dev" volumeMount while CRI-O is used (while it works fine with Docker). Can you share the deployment details that you use to reproduce this problem?

Comment 3 Kshithij Iyer 2019-02-08 07:24:49 UTC

This bug was observed because of the docker version at that particular build. Have not observed it after that. You can close this bug saying "worksforme" or "fixed".

Comment 4 Michael Adam 2019-02-08 08:28:56 UTC

Thanks, closing.

Note You need to log in before you can comment on or make changes to this bug.