Description of problem: After creating a new app in dev-preview-int, the app's PV never mounts to the node. [root@dev-preview-int-master-d41bf ~]# oc new-app cakephp-mysql-example -n dakinitest [root@dev-preview-int-master-d41bf ~]# oc get pods -n dakinitest NAME READY STATUS RESTARTS AGE cakephp-mysql-example-1-build 0/1 Completed 0 5m cakephp-mysql-example-1-deploy 1/1 Running 0 1m cakephp-mysql-example-1-hook-pre 0/1 CrashLoopBackOff 2 1m mysql-1-deploy 1/1 Running 0 5m mysql-1-lmny0 0/1 ContainerCreating 0 5m [root@dev-preview-int-master-d41bf ~]# oc get pvc -n dakinitest NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mysql Bound pv-aws-2pkrx 1Gi RWO 5m [root@dev-preview-int-master-d41bf ~]# oc get events -w -n dakinitest 2016-08-25T23:11:36Z 2016-08-25T23:11:36Z 1 mysql-3-xvlqq Pod Warning FailedMount {kubelet ip-172-31-14-249.ec2.internal} Unable to mount volumes for pod "mysql-3-xvlqq_dakinitest(fe28677c-6b18-11e6-9552-0a07af694611)": timeout expired waiting for volumes to attach/mount for pod "mysql-3-xvlqq"/"dakinitest". list of unattached/unmounted volumes=[mysql-data default-token-rlzzv] 2016-08-25T23:17:18Z 2016-08-25T23:12:50Z 3 mysql-1-lmny0 Pod Warning FailedMount {kubelet ip-172-31-14-247.ec2.internal} Unable to mount volumes for pod "mysql-1-lmny0_dakinitest(2a4dc3df-6b19-11e6-9552-0a07af694611)": timeout expired waiting for volumes to attach/mount for pod "mysql-1-lmny0"/"dakinitest". list of unattached/unmounted volumes=[mysql-data] [root@dev-preview-int-master-d41bf ~]# oc get pvc -n dakinitest -o yaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: openshift.io/generated-by: OpenShiftNewApp pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" v1.2-volume.experimental.kubernetes.io/provisioning-required: volume.experimental.kubernetes.io/provisioning-completed creationTimestamp: 2016-08-25T23:10:38Z labels: app: cakephp-mysql-example template: cakephp-mysql-example name: mysql namespace: dakinitest resourceVersion: "11534638" selfLink: /api/v1/namespaces/dakinitest/persistentvolumeclaims/mysql uid: 22a7af26-6b19-11e6-9552-0a07af694611 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi volumeName: pv-aws-2pkrx status: accessModes: - ReadWriteOnce capacity: storage: 1Gi phase: Bound kind: List metadata: {} [root@dev-preview-int-master-d41bf ~]# oc get pv pv-aws-2pkrx -o yaml apiVersion: v1 kind: PersistentVolume metadata: annotations: kubernetes.io/createdby: aws-ebs-dynamic-provisioner v1.2-volume.experimental.kubernetes.io/provisioning-required: volume.experimental.kubernetes.io/provisioning-completed volume.alpha.kubernetes.io/storage-class: "" creationTimestamp: 2016-08-25T23:10:38Z generateName: pv-aws- labels: failure-domain.beta.kubernetes.io/region: us-east-1 failure-domain.beta.kubernetes.io/zone: us-east-1c name: pv-aws-2pkrx resourceVersion: "11534676" selfLink: /api/v1/persistentvolumes/pv-aws-2pkrx uid: 22ad6fd9-6b19-11e6-96ed-0a66422e54e3 spec: accessModes: - ReadWriteOnce awsElasticBlockStore: fsType: ext4 volumeID: aws://us-east-1c/vol-bd679f11 capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: mysql namespace: dakinitest resourceVersion: "11534619" uid: 22a7af26-6b19-11e6-9552-0a07af694611 persistentVolumeReclaimPolicy: Delete status: phase: Bound In AWS, I can see the EBS volume is in 'Available' state, so it hasn't been mounted to any node. All of the dev-preview-int nodes contain the following in their node-config.yaml: enable-controller-attach-detach: - 'true' Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Can you verify the node is correctly configured with the cloud provider? "timeout expired waiting for volumes to attach/mount for pod" is an error from Kubelet when it uses the AWS API to attach the device (EBS volume) to the node. We've seen this error before when Kubelet was not correctly configured with a cloud provider. Additionally, the account used to access the AWS API must have permissions to attach volumes.
Is there anything in the atomic-openshift-node log on ip-172-31-14-249.ec2.internal for vol-bd679f11?
No, not on that specific node. I did a grep across the cluster and found a different node that had some entries about it: Aug 25 23:10:51 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:10:51.050463 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:10:51 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:10:51.050745 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:10:51.55071811 +0000 UTC (durationBeforeRetry 500ms). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") has not yet been added to the list of VolumesInUse in the node's volume status. Aug 25 23:10:51 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:10:51.561499 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:10:51 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:10:51.561681 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:10:52.561652188 +0000 UTC (durationBeforeRetry 1s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") has not yet been added to the list of VolumesInUse in the node's volume status. Aug 25 23:10:52 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:10:52.576160 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:10:52 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:10:52.576371 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:10:54.576342163 +0000 UTC (durationBeforeRetry 2s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") has not yet been added to the list of VolumesInUse in the node's volume status. Aug 25 23:10:54 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:10:54.611582 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:10:54 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:10:54.618044 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:10:58.61801555 +0000 UTC (durationBeforeRetry 4s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:10:58 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:10:58.659740 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:10:58 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:10:58.664829 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:11:06.66479893 +0000 UTC (durationBeforeRetry 8s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:11:06 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:11:06.715891 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:11:06 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:11:06.720228 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:11:22.72020032 +0000 UTC (durationBeforeRetry 16s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:11:22 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:11:22.785289 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:11:22 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:11:22.789947 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:11:54.789927052 +0000 UTC (durationBeforeRetry 32s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:11:54 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:11:54.837092 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:11:54 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:11:54.841837 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:12:58.841811644 +0000 UTC (durationBeforeRetry 1m4s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:12:58 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:12:58.854491 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:12:58 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:12:58.859838 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:14:58.859809229 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:14:58 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:14:58.931766 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:14:58 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:14:58.936232 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:16:58.93620309 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:16:58 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:16:58.946086 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:16:58 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:16:58.950653 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:18:58.950626185 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:18:59 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:18:59.047442 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:18:59 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:18:59.051387 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:20:59.051368129 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status. Aug 25 23:20:59 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: I0825 23:20:59.054466 27346 reconciler.go:180] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") Aug 25 23:20:59 ip-172-31-14-247.ec2.internal atomic-openshift-node[27346]: E0825 23:20:59.059019 27346 nestedpendingoperations.go:233] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11\"" failed. No retries permitted until 2016-08-25 23:22:59.05900051 +0000 UTC (durationBeforeRetry 2m0s). Error: Volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-bd679f11" (spec.Name: "pv-aws-2pkrx") pod "2a4dc3df-6b19-11e6-9552-0a07af694611" (UID: "2a4dc3df-6b19-11e6-9552-0a07af694611") is not yet attached according to node status.
I would check to see whether: 1. Kubelet attach/detach OR controller attach/detach is enabled _on the kubelet_ 2. controller attach/detach is enabled _on the controller manager_ This looks to me like the kubelet may be waiting for the controller to do the attach - and the controller manager isn't configured to do the attach - so the kubelet times out. Looks like the volume was bound correctly to the claim.
I was able to reproduce this on origin master. There are two bugs here: 1. Why the kubelet is not applying the controller-managed-attach-detach annotation to the node 2. Why the kubelet is not doing the attach/detach work itself if it is doing (1)
We default controller-managed-attach-detach to false, and ansible is responsible for turning it on.
In this case, the node config file appears to be in a correct state for enabling this feature.
The issue here is that the Kubelet does not update existing nodes to add the annotation that controls this feature. I've made a PR upstream to fix this issue.
I've opened a PR to origin with the upstream patches for this issue: https://github.com/openshift/origin/pull/10748
*** Bug 1372458 has been marked as a duplicate of this bug. ***
Just to recap what we looked at yesterday... atomic-openshift-master-controllers service is saying "AttachVolume.Attach succeeded for volume" The EBS volume is actually being attached to the correct instance in AWS. And atomic-openshift-node is saying the volume "has not yet been added to the list of VolumesInUse in the node's volume status" So it seems the controller is doing its job and the kubelet is missing some information that would allow it to proceed.
PVs are now working with this version atomic-openshift-3.3.0.30-1.git.2.bb5beaa.el7.x86_64.
This has been merged into OCP and is in version v3.3.0.31. This is planned on the GA release. It is in images now and ready for testing.
Verified on openshift v3.3.0.31 kubernetes v1.3.0+52492b4 etcd 2.3.0+git The issue is not reproducible now with 'enable-controller-attach-detach' set to true.
PV/PVC work fine now in online dev-preview-int. Thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933