Description of problem: I see that when trying to upgrade my gluster pods from 3.9 to 3.11.1 it fails with the error below and the pod is stuck at terminating state. LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 2018-11-22 14:20:32 +0530 IST 2018-11-22 14:20:32 +0530 IST 1 glusterfs-storage-7mg42.1569661f82236cca Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 2018-11-22 14:20:42 +0530 IST 2018-11-22 14:20:42 +0530 IST 1 glusterfs-storage-7mg42.15696621a751a461 Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Readiness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 2018-11-22 14:20:57 +0530 IST 2018-11-22 14:20:32 +0530 IST 2 glusterfs-storage-7mg42.1569661f82236cca Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 2018-11-22 14:21:07 +0530 IST 2018-11-22 14:20:42 +0530 IST 2 glusterfs-storage-7mg42.15696621a751a461 Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Readiness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 2018-11-22 14:21:22 +0530 IST 2018-11-22 14:20:32 +0530 IST 3 glusterfs-storage-7mg42.1569661f82236cca Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 2018-11-22 14:21:32 +0530 IST 2018-11-22 14:20:42 +0530 IST 3 glusterfs-storage-7mg42.15696621a751a461 Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Readiness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 2018-11-22 14:21:47 +0530 IST 2018-11-22 14:20:32 +0530 IST 4 glusterfs-storage-7mg42.1569661f82236cca Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 2018-11-22 14:21:57 +0530 IST 2018-11-22 14:20:42 +0530 IST 4 glusterfs-storage-7mg42.15696621a751a461 Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Readiness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 2018-11-22 14:22:00 +0530 IST 2018-11-22 13:05:08 +0530 IST 3 glusterfs-storage-7mg42.156962022ca9bb1c Pod spec.containers{glusterfs} Normal Killing kubelet, dhcp46-23.lab.eng.blr.redhat.com Killing container with id docker://glusterfs:Need to kill Pod 2018-11-22 14:22:00 +0530 IST 2018-11-22 14:22:00 +0530 IST 1 glusterfs-storage-7mg42.15696633e9d1f91b Pod Warning FailedKillPod kubelet, dhcp46-23.lab.eng.blr.redhat.com error killing pod: failed to "KillContainer" for "glusterfs" with KillContainerError: "rpc error: code = Unknown desc = operation timeout: context deadline exceeded" Version-Release number of selected component (if applicable): [root@dhcp46-160 ~]# oc version oc v3.9.51 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dhcp46-160.lab.eng.blr.redhat.com:8443 openshift v3.9.51 kubernetes v1.9.1+a0ce1bc657 How reproducible: Hit it once Steps to Reproduce: 1. Install ocp3.9 + cns 3.9 2. Delete the gluster deamonset 3. Edit the gluster template to have the following image name and version and create the ds again. displayName: Daemonset Node Labels name: NODE_LABELS value: '{ "glusterfs": "storage-host" }' - displayName: GlusterFS container image name name: IMAGE_NAME required: true value: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-server-rhel7 - displayName: GlusterFS container image version name: IMAGE_VERSION required: true value: 3.11.1 - description: A unique name to identify which heketi service manages this cluster, useful for running multiple heketi instances displayName: GlusterFS cluster name name: CLUSTER_NAME value: storage 4. Run the command to create ds again, oc process glusterfs | oc create -f - 5. Verify that ds has been created. [root@dhcp46-160 ~]# oc get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE glusterfs-storage 4 4 3 0 3 glusterfs=storage-host 1h 6. Now delete the gluster pod by running the command "oc delete pod <glusterfs_pod>" Actual results: gluster pod stuck in terminating state and the below errors are seen in the events. [root@dhcp46-160 ~]# oc get events LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 4m 2h 29 cirros-block-1-87d9l.1569625ea154505a Pod spec.containers{cirros} Warning Unhealthy kubelet, dhcp46-160.lab.eng.blr.redhat.com Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded 3m 3h 53 glusterfs-storage-7mg42.156962022ca9bb1c Pod spec.containers{glusterfs} Normal Killing kubelet, dhcp46-23.lab.eng.blr.redhat.com Killing container with id docker://glusterfs:Need to kill Pod 1h 1h 4 glusterfs-storage-7mg42.1569661f82236cca Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 1h 1h 4 glusterfs-storage-7mg42.15696621a751a461 Pod spec.containers{glusterfs} Warning Unhealthy kubelet, dhcp46-23.lab.eng.blr.redhat.com Readiness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\"" 1h 1h 6 glusterfs-storage-7mg42.15696633e9d1f91b Pod Warning FailedKillPod kubelet, dhcp46-23.lab.eng.blr.redhat.com error killing pod: failed to "KillContainer" for "glusterfs" with KillContainerError: "rpc error: code = Unknown desc = operation timeout: context deadline exceeded" Expected results: Should be successfully able to upgrade the pods to 3.11.1 Additional info: [root@dhcp46-160 ~]# oc get pods -l glusterfs NAME READY STATUS RESTARTS AGE glusterblock-storage-provisioner-dc-1-h26pt 1/1 Running 1 22h glusterfs-storage-7mg42 0/1 Terminating 1 4h glusterfs-storage-878qw 1/1 Running 1 3h glusterfs-storage-bxv4h 1/1 Running 1 4h glusterfs-storage-glbfg 1/1 Running 1 4h heketi-storage-1-qwsxk 1/1 Running 0 2h [root@dhcp46-160 ~]# oc get nodes NAME STATUS ROLES AGE VERSION dhcp46-107.lab.eng.blr.redhat.com Ready <none> 22h v1.9.1+a0ce1bc657 dhcp46-160.lab.eng.blr.redhat.com Ready master 22h v1.9.1+a0ce1bc657 dhcp46-222.lab.eng.blr.redhat.com Ready compute 22h v1.9.1+a0ce1bc657 dhcp46-23.lab.eng.blr.redhat.com Ready compute 22h v1.9.1+a0ce1bc657 dhcp46-236.lab.eng.blr.redhat.com Ready compute 22h v1.9.1+a0ce1bc657 dhcp47-134.lab.eng.blr.redhat.com Ready compute 22h v1.9.1+a0ce1bc657 dhcp47-147.lab.eng.blr.redhat.com Ready compute 22h v1.9.1+a0ce1bc657 dhcp47-37.lab.eng.blr.redhat.com Ready compute 22h v1.9.1+a0ce1bc657 dhcp47-80.lab.eng.blr.redhat.com Ready compute 22h v1.9.1+a0ce1bc657 [root@dhcp46-160 ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-volmanager-rhel7 3.11.1 cba22f33b513 8 days ago 424 MB
Closing this bug as it works for me. The problem was image version did not had a v in the value. - displayName: GlusterFS container image version name: IMAGE_VERSION required: true value: 3.11.1 changed it to v3.11.1 - displayName: GlusterFS container image version name: IMAGE_VERSION required: true value: v3.11.1