Description of problem: On an independent mode OCS 3.11 setup, the liveliness probe fails with the following message: Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\"" Post which the pod keeps on restarting in a loop and and then eventually crashes with "CrashLoopBackOff" status. while I try to rsh into the pod, it displays the following message: # oc rsh cirrossc1-1-dtpb7 rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\"" command terminated with exit code 126 Version-Release number of selected component (if applicable): Gluster version: 3.12.2-25 Heketi version: 7.0.0 How reproducible: 4/4 Steps to Reproduce: 1. Create a storage class. 2. Create a pvc claim on the storage class. 3. Create a cirros pod with pvc claim. 4. Try doing an oc rsh into the pod and do a oc describe. (oc rsh will fail and oc describe will show the following message. ) Actual results: oc describe pod shows the following error: Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\"" oc rsh shows the following error: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\"" command terminated with exit code 126 Expected results: It should not show any error in oc describe pod and oc rsh should be successful. Additional info: Have tried this scenario with both accessModes "ReadWriteMany" and "ReadWriteOnce" in the pvc.yml still end up with the same results.
This is unlikely a problem caused by OCS. Can you 'oc rsh' into a pod that does not have any PVC attached? What about other non-cirros container images? During development we have seen similar issues with containers that setup a "/dev" volumeMount while CRI-O is used (while it works fine with Docker). Can you share the deployment details that you use to reproduce this problem?
This bug was observed because of the docker version at that particular build. Have not observed it after that. You can close this bug saying "worksforme" or "fixed".
Thanks, closing.