+++ This bug was initially created as a clone of Bug #1339295 +++ When a pod is destroyed the cinder disk does not get detached from the instance. May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.690529 28793 kubelet.go:2443] SyncLoop (SYNC): 1 pods; jenkins-1-4pw73_maci(7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0) May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.690575 28793 kubelet.go:3258] Generating status for "jenkins-1-4pw73_maci(7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0)" May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.690602 28793 kubelet.go:3225] pod waiting > 0, pending May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.690702 28793 manager.go:277] Ignoring same status for pod "jenkins-1-4pw73_maci(7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0)", status: {Phase:Pending Conditions:[{Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2016-05-18 19:03:44 +0200 CEST Reason:ContainersNotReady Message:containers with unready status: [jenkins]}] Message: Reason: HostIP:192.168.192.113 PodIP: StartTime:2016-05-18 19:03:44 +0200 CEST ContainerStatuses:[{Name:jenkins State:{Waiting:0xc2097a8560 Running:<nil> Terminated:<nil>} LastTerminationState:{Waiting:<nil> Running:<nil> Terminated:<nil>} Ready:false RestartCount:0 Image:registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest ImageID: ContainerID:}]} May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.695057 28793 cinder.go:235] Cinder SetUp ee0a4fde-97c2-42dc-bc2c-c612cf85a963 to /var/lib/origin/openshift.local.volumes/pods/7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0/volumes/kubernetes.io~cinder/pv-cinder-dzpgg May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.695073 28793 keymutex.go:49] LockKey(...) called for id "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.695082 28793 keymutex.go:52] LockKey(...) for id "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" completed. May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.695095 28793 nsenter_mount.go:175] findmnt: directory /var/lib/origin/openshift.local.volumes/pods/7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0/volumes/kubernetes.io~cinder/pv-cinder-dzpgg does not exist May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.760611 28793 openstack.go:1037] ee0a4fde-97c2-42dc-bc2c-c612cf85a963 kubernetes-dynamic-pv-cinder-dzpgg [map[server_id:fb40d180-2733-4208-8ea3-b9258b9d35bc attachment_id:78c1f486-928c-4b8d-9b3f-c3e4f2e1b68a host_name:<nil> volume_id:ee0a4fde-97c2-42dc-bc2c-c612cf85a963 device:/dev/vdc id:ee0a4fde-97c2-42dc-bc2c-c612cf85a963]] May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: E0518 19:06:11.760649 28793 openstack.go:972] Disk "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" is attached to a different compute: "fb40d180-2733-4208-8ea3-b9258b9d35bc", should be detached before proceeding May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.760657 28793 cinder.go:252] AttachDisk failed: Disk "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" is attached to a different compute: "fb40d180-2733-4208-8ea3-b9258b9d35bc", should be detached before proceeding May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.760663 28793 keymutex.go:58] UnlockKey(...) called for id "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.760668 28793 keymutex.go:65] UnlockKey(...) for id. Mutex found, trying to unlock it. "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.760673 28793 keymutex.go:68] UnlockKey(...) for id "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" completed. May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: E0518 19:06:11.760706 28793 kubelet.go:1796] Unable to mount volumes for pod "jenkins-1-4pw73_maci(7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0)": Disk "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" is attached to a different compute: "fb40d180-2733-4208-8ea3-b9258b9d35bc", should be detached before proceeding; skipping pod May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: E0518 19:06:11.760716 28793 pod_workers.go:138] Error syncing pod 7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0, skipping: Disk "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" is attached to a different compute: "fb40d180-2733-4208-8ea3-b9258b9d35bc", should be detached before proceeding May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.760736 28793 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"maci", Name:"jenkins-1-4pw73", UID:"7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0", APIVersion:"v1", ResourceVersion:"6481", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to mount volumes for pod "jenkins-1-4pw73_maci(7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0)": Disk "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" is attached to a different compute: "fb40d180-2733-4208-8ea3-b9258b9d35bc", should be detached before proceeding May 18 19:06:11 node1.openshift.com atomic-openshift-node[28748]: I0518 19:06:11.760775 28793 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"maci", Name:"jenkins-1-4pw73", UID:"7a9f5e89-1d1a-11e6-b3dc-fa163e5e26b0", APIVersion:"v1", ResourceVersion:"6481", FieldPath:""}): type: 'Warning' reason: 'FailedSync' Error syncing pod, skipping: Disk "ee0a4fde-97c2-42dc-bc2c-c612cf85a963" is attached to a different compute: "fb40d180-2733-4208-8ea3-b9258b9d35bc", should be detached before proceeding Version oc v3.2.0.20 kubernetes v1.2.0-36-g4a3f9c5 Steps To Reproduce set up containerized OSE on RHEL-AH on top of OSP use cinder volume in OSE for example with jenkins-persistent works fine scale down to 0 scale up to 1 pod cant start on a different node Current Result pod cant start on a different node Expected Result pod starts on different node, uses cinder volume --- Additional comment from Miheer Salunke on 2016-05-24 23:20:50 CST --- Upstream issue filed- https://github.com/openshift/origin/issues/8926 --- Additional comment from on 2016-05-25 21:58:59 CST --- Disk detach is triggered by pod deletion. If the pod is not removed, the disk is still attached to the node. Before you scale up to 1, can you check if the Pod is completely removed? The Pod is not immediately removed when it is told to scale down to 0. --- Additional comment from Miheer Salunke on 2016-05-25 22:21:19 CST --- Hi, How can we confirm that the pod is completely removed ? Eg- #oc get pods #docker ps Will the above commands suffice or some additional checks need to be done ? --- Additional comment from on 2016-05-25 23:00:40 CST --- Yes, oc get pods --- Additional comment from Marcel Wysocki on 2016-05-26 17:07:21 CST --- @hchen in my latest test that is attached to the support case, you can see I did delete the pod, and the RC did spin up a new one. the old one was gone but the disk didn't get detached and is just attached to the same OSP instance for ever. --- Additional comment from on 2016-05-26 21:31:04 CST --- @Marcel, I cannot see the support case - I got 404 error on the page. There are known race conditions on volume attach and detach, we are trying to resolve them in kubernetes 1.3. GCE PD/Cinder/EBS are being refactored. See https://github.com/kubernetes/kubernetes/pull/25888 --- Additional comment from Marcel Wysocki on 2016-05-27 05:10:48 CST --- strange, https://access.redhat.com/support/cases/#/case/01616524 works just fine here. Here it may or may not be the race conditions, because never once the detach worked, not with 3.1 nor 3.2 --- Additional comment from Miheer Salunke on 2016-05-27 18:05:21 CST --- @hchen, [cloud-user@installer-a1-sjzgz274 ~]$ cinder create 5 +---------------------------------------+--------------------------------------+ | Property | Value | +---------------------------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2016-05-25T06:54:12.000000 | | description | None | | encrypted | False | | id | 48a2da46-911e-4c34-99cb-5320f01280b4 | | metadata | {} | | multiattach | False | | name | None | | os-vol-tenant-attr:tenant_id | ebf0c505d29947dc84227461b590ed7d | | os-volume-replication:driver_data | None | | os-volume-replication:extended_status | None | | replication_status | disabled | | size | 5 | | snapshot_id | None | | source_volid | None | | status | available | | user_id | 40700b118f4a4e15ae06d3b68149fe10 | | volume_type | None | +---------------------------------------+--------------------------------------+ cloud-user@installer-a1-sjzgz274 ~]$ echo ' > apiVersion: v1 > kind: PersistentVolume > metadata: > name: pv0001 > spec: > capacity: > storage: 5Gi > accessModes: > - ReadWriteOnce > cinder: > fsType: ext4 > volumeID: 48a2da46-911e-4c34-99cb-5320f01280b4 > ' | oc create -f - persistentvolume "pv0001" created [cloud-user@installer-a1-sjzgz274 ~]$ [cloud-user@installer-a1-sjzgz274 ~]$ oc get pv NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE pv0001 5Gi RWO Available 40s [cloud-user@installer-a1-sjzgz274 ~]$ oc process jenkins-persistent -n openshift| oc create -f - service "jenkins" created route "jenkins" created persistentvolumeclaim "jenkins" created deploymentconfig "jenkins" created [cloud-user@installer-a1-sjzgz274 ~]$ oc get pv NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE pv0001 5Gi RWO Bound default/jenkins 22m [cloud-user@installer-a1-sjzgz274 ~]$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE jenkins Bound pv0001 5Gi RWO 33s [cloud-user@installer-a1-sjzgz274 ~]$ [cloud-user@installer-a1-sjzgz274 ~]$ oc describe po/$(oc get po|grep jenkins|awk '{print $1}')|grep Node Node: infra-a1-ed21g636.test.osp.sfa.se/192.168.192.113 [cloud-user@installer-a1-sjzgz274 ~]$ oc delete po/$(oc get po|grep jenkins|awk '{print $1}') pod "jenkins-1-y6op1" deleted cloud-user@installer-a1-sjzgz274 ~]$ oc describe po/$(oc get po|grep jenkins|awk '{print $1}')|grep Node Node: infra-a2-ta6r080x.test.osp.sfa.se/192.168.192.116 [cloud-user@installer-a1-sjzgz274 ~]$ oc describe po/$(oc get po|grep jenkins|awk '{print $1}') Name: jenkins-1-lcz2t Namespace: default Node: infra-a2-ta6r080x.test.osp.sfa.se/192.168.192.116 Start Time: Wed, 25 May 2016 11:19:39 +0200 Labels: deployment=jenkins-1,deploymentconfig=jenkins,name=jenkins Status: Pending IP: Controllers: ReplicationController/jenkins-1 Containers: jenkins: Container ID: Image: registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest Image ID: Port: QoS Tier: cpu: BestEffort memory: Guaranteed Limits: memory: 512Mi Requests: memory: 512Mi State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Liveness: http-get http://:8080/login delay=30s timeout=3s period=10s #success=1 #failure=3 Readiness: http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3 Environment Variables: JENKINS_PASSWORD: password Conditions: Type Status Ready False Volumes: jenkins-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: jenkins ReadOnly: false default-token-7adg3: Type: Secret (a volume populated by a Secret) SecretName: default-token-7adg3 Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 {default-scheduler } Normal Scheduled Successfully assigned jenkins-1-lcz2t to infra-a2-ta6r080x.test.osp.sfa.se 1m 2s 7 {kubelet infra-a2-ta6r080x.test.osp.sfa.se} Warning FailedMount Unable to mount volumes for pod "jenkins-1-lcz2t_default(ceaf70a7-2259-11e6-90d4-fa163e8828a1)": Disk "48a2da46-911e-4c34-99cb-5320f01280b4" is attached to a different compute: "3d824247-b1cd-4e8d-a77d-8d5a4722ade8", should be detached before proceeding 1m 2s 7 {kubelet infra-a2-ta6r080x.test.osp.sfa.se} Warning FailedSync Error syncing pod, skipping: Disk "48a2da46-911e-4c34-99cb-5320f01280b4" is attached to a different compute: "3d824247-b1cd-4e8d-a77d-8d5a4722ade8", should be detached before proceeding [cloud-user@installer-a1-sjzgz274 ~]$ [cloud-user@installer-a1-sjzgz274 ~]$ oc get events FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 11m 11m 1 infra-a2-ta6r080x.test.osp.sfa.se Node Normal NodeNotReady {controllermanager } Node infra-a2-ta6r080x.test.osp.sfa.se status is now: NodeNotReady 11m 11m 1 infra-a2-ta6r080x.test.osp.sfa.se Node Normal NodeReady {kubelet infra-a2-ta6r080x.test.osp.sfa.se} Node infra-a2-ta6r080x.test.osp.sfa.se status is now: NodeReady 10m 10m 1 jenkins-1-deploy Pod Normal Scheduled {default-scheduler } Successfully assigned jenkins-1-deploy to infra-a2-ta6r080x.test.osp.sfa.se 10m 10m 1 jenkins-1-deploy Pod spec.containers{deployment} Normal Pulled {kubelet infra-a2-ta6r080x.test.osp.sfa.se} Container image "openshift3/ose-deployer:v3.2.0.44" already present on machine 10m 10m 1 jenkins-1-deploy Pod spec.containers{deployment} Normal Created {kubelet infra-a2-ta6r080x.test.osp.sfa.se} Created container with docker id b2d27f2f0616 10m 10m 1 jenkins-1-deploy Pod spec.containers{deployment} Normal Started {kubelet infra-a2-ta6r080x.test.osp.sfa.se} Started container with docker id b2d27f2f0616 1m 1m 1 jenkins-1-lcz2t Pod Normal Scheduled {default-scheduler } Successfully assigned jenkins-1-lcz2t to infra-a2-ta6r080x.test.osp.sfa.se 1m 9s 8 jenkins-1-lcz2t Pod Warning FailedMount {kubelet infra-a2-ta6r080x.test.osp.sfa.se} Unable to mount volumes for pod "jenkins-1-lcz2t_default(ceaf70a7-2259-11e6-90d4-fa163e8828a1)": Disk "48a2da46-911e-4c34-99cb-5320f01280b4" is attached to a different compute: "3d824247-b1cd-4e8d-a77d-8d5a4722ade8", should be detached before proceeding 1m 9s 8 jenkins-1-lcz2t Pod Warning FailedSync {kubelet infra-a2-ta6r080x.test.osp.sfa.se} Error syncing pod, skipping: Disk "48a2da46-911e-4c34-99cb-5320f01280b4" is attached to a different compute: "3d824247-b1cd-4e8d-a77d-8d5a4722ade8", should be detached before proceeding 10m 10m 1 jenkins-1-y6op1 Pod Normal Scheduled {default-scheduler } Successfully assigned jenkins-1-y6op1 to infra-a1-ed21g636.test.osp.sfa.se 10m 10m 1 jenkins-1-y6op1 Pod spec.containers{jenkins} Normal Pulling {kubelet infra-a1-ed21g636.test.osp.sfa.se} pulling image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" 9m 9m 1 jenkins-1-y6op1 Pod spec.containers{jenkins} Normal Pulled {kubelet infra-a1-ed21g636.test.osp.sfa.se} Successfully pulled image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" 9m 9m 1 jenkins-1-y6op1 Pod spec.containers{jenkins} Normal Created {kubelet infra-a1-ed21g636.test.osp.sfa.se} Created container with docker id 13074f285497 9m 9m 1 jenkins-1-y6op1 Pod spec.containers{jenkins} Normal Started {kubelet infra-a1-ed21g636.test.osp.sfa.se} Started container with docker id 13074f285497 9m 9m 2 jenkins-1-y6op1 Pod spec.containers{jenkins} Warning Unhealthy {kubelet infra-a1-ed21g636.test.osp.sfa.se} Readiness probe failed: Get http://10.1.5.3:8080/login: dial tcp 10.1.5.3:8080: connection refused 9m 9m 1 jenkins-1-y6op1 Pod spec.containers{jenkins} Warning Unhealthy {kubelet infra-a1-ed21g636.test.osp.sfa.se} Readiness probe failed: HTTP probe failed with statuscode: 503 1m 1m 1 jenkins-1-y6op1 Pod spec.containers{jenkins} Normal Killing {kubelet infra-a1-ed21g636.test.osp.sfa.se} Killing container with docker id 13074f285497: Need to kill pod. 10m 10m 1 jenkins-1 ReplicationController Normal SuccessfulCreate {replication-controller } Created pod: jenkins-1-y6op1 1m 1m 1 jenkins-1 ReplicationController Normal SuccessfulCreate {replication-controller } Created pod: jenkins-1-lcz2t 10m 10m 1 jenkins DeploymentConfig Normal DeploymentCreated {deploymentconfig-controller } Created new deployment "jenkins-1" for version 1 10m 10m 1 jenkins DeploymentConfig Warning FailedUpdate {deployment-controller } Cannot update deployment default/jenkins-1 status to Pending: replicationcontrollers "jenkins-1" cannot be updated: the object has been modified; please apply your changes to the latest version and try again [cloud-user@installer-a1-sjzgz274 ~]$ oc logs jenkins-1-lcz2t Error from server: container "jenkins" in pod "jenkins-1-lcz2t" is waiting to start: ContainerCreating Docker Logs - http://foobar.gsslab.pnq.redhat.com/01616524/30-sosreport-infra-a1-ed21g636.test.osp.sfa.se.01616524-20160525112647.tar.xz/sosreport-infra-a1-ed21g636.test.osp.sfa.se.01616524-20160525112647/sos_commands/logs/ http://foobar.gsslab.pnq.redhat.com/01616524/40-sosreport-infra-a2-ta6r080x.test.osp.sfa.se.01616524-20160525113958.tar.xz/sosreport-infra-a2-ta6r080x.test.osp.sfa.se.01616524-20160525113958/sos_commands/logs/ --- Additional comment from Marcel Wysocki on 2016-05-30 16:06:17 CST --- @hchen, do you need any more information for this ? --- Additional comment from on 2016-05-31 22:06:39 CST --- This is probably a persistent volume issue. Can you delete the PV and PVC then recreate PV and PVC before you scale up the rc? --- Additional comment from Marcel Wysocki on 2016-06-01 21:42:44 CST --- @hchen deleting the PV and PVC also leaves the volume attached to the instance. --- Additional comment from on 2016-06-02 00:42:26 CST --- I cannot reproduce it on OS1. Can I login to your setup? --- Additional comment from Miheer Salunke on 2016-06-02 01:32:27 CST --- Marcel can we access your setup ? If yes please let us know a time convenient for you so that we can access your setup --- Additional comment from on 2016-06-15 06:41:50 CST --- Containerize kubelet failed to unmount/detach Cinder volume. fix proposed to upstream kubernetes https://github.com/kubernetes/kubernetes/pull/27380 --- Additional comment from Marcel Wysocki on 2016-06-15 17:15:31 CST --- If this makes it upstream will there be a backport to OSE 3.2? --- Additional comment from on 2016-06-15 20:59:20 CST --- @Marcel, Can you comment on the PR to request backport? --- Additional comment from Marcel Wysocki on 2016-06-28 18:23:36 CST --- https://github.com/kubernetes/kubernetes/pull/28018 Can we get this into an errata release? --- Additional comment from Troy Dawson on 2016-07-23 03:39:08 CST --- Can we get a seperate bugzilla for OSE 3.2 versus 3.3. This has been merged and is in OSE v3.3.0.9 or newer.
Verified this on oc v3.3.0.9 kubernetes v1.3.0+57fb9ac Using containerized installation.The cinder volume is detached from the node when the pod is deletedor scale(PV and PVC remaining). Viewing from the openstack console, the volume has 'Available' status.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933