I am on the Dev Preview of Openshift v3 (user account: Bernard) My pods don't seem to be able to mount a PV - the errors generated are like this: 11:36:30 AM dbsurvey-4-yay2b Pod Warning Failed mount Unable to mount volumes for pod "dbsurvey-4-yay2b_dbsurvey(b0c580fa-bfbf-11e6-9d4e-0e3d364e19a5)": timeout expired waiting for volumes to attach/mount for pod "dbsurvey-4-yay2b"/"dbsurvey". list of unattached/unmounted volumes=[upload-data] 11:36:30 AM dbsurvey-4-yay2b Pod Warning Failed sync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "dbsurvey-4-yay2b"/"dbsurvey". list of unattached/unmounted volumes=[upload-data] When I create the app from scratch (using the PHP 5.6 image), it works perfectly. Then, I scale down the pods to 0 and change the config to mount the PV. This works (most of the time). However, if I later on rebuild the image (eg because of changed code in GIT), it fails to remount that PV. I do each time scale down to 0 just to make sure but when I then scale up again, I keep getting that error and can no longer get a new pod to start. It keeps retrying to mount but fails each time. My only recourse at that time is to restart from scratch - which seems the wrong approach.
ps. I first posted on the Google group and received the advice there to create a bug report as this apparently is a known issue and requires the operations team to intervene.
This is most likely due to the fact that you have "Rolling" as your deployment strategy and your pods require a volume. In Developer Preview, the PVCs are backed by EBS volumes and these cannot be mounted on two different nodes. In case of a rolling deployment, the new pod comes up first as a canary and only when that is successful does the deployment proceed and the old pod taken down. The new pod (from the new deployment) will try to mount the volume and fail since the old pod (from the current/existing deployment) still has the volume mounted. These pods can (and most likely will) be scheduled on different nodes and hence this is not going to work. The solution is to use "Recreate" as the deployment strategy in case the pods rely on PVCs. Can you confirm this to be the issue and that the suggested change resolves this?
I have similar issue that the original poster experience I have changed the YAML of the deployment (change Rolling to Recreate) and recreate the PVC and it works for 2-3 times Now, the PVC can't even be used anymore Here is the log --> Scaling beta-23 down to zero --> Scaling beta-24 to 1 before performing acceptance check --> Waiting up to 10m0s for pods in deployment beta-24 to become ready error: update acceptor rejected beta-24: pods for deployment "beta-24" took longer than 600 seconds to become ready Let me know if you need anything else thanks
My openshift email account is the same as my bugzilla's email address - and I haven't deleted my pod so you can look at it thanks
Yeah this seems similar to problem in https://bugzilla.redhat.com/show_bug.cgi?id=1404811. But storage team has made lot of improvements in attach/detach code path for AWS. If you can upgrade to 3.4 and try and if it doesn't work let us know.
The Online environment has recently been upgraded. Can you please confirm if this is still an issue?
It works better (or probably flawlessly) until a few days ago when I can't even mount a volume to a pod This is the error that I got W0210 03:01:37.993356 1 reflector.go:330] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:468: watch of *api.Pod ended with: too old resource version: 850065961 (850090417) Please look into this thanks
Can you post more details: 1. What does your pod and pv yamls look like? If you can't post it publicly you can email it to me. 2. What kind of volume type you were using? 3. How did you deploy openshift? 4. Exact version of openshift you were using. 5. Also more logs around that error will be helpful.
1. here is my pod YAML apiVersion: v1 kind: Pod metadata: name: beta-2-deploy namespace: divvy selfLink: /api/v1/namespaces/divvy/pods/beta-2-deploy uid: 0343a47a-ef46-11e6-b599-0e63b9c1c48f resourceVersion: '850334773' creationTimestamp: '2017-02-10T04:04:26Z' labels: openshift.io/deployer-pod-for.name: beta-2 annotations: kubernetes.io/limit-ranger: >- LimitRanger plugin set: cpu, memory request for container deployment; cpu, memory limit for container deployment openshift.io/deployment.name: beta-2 openshift.io/scc: restricted spec: volumes: - name: deployer-token-o8eyc secret: secretName: deployer-token-o8eyc defaultMode: 420 containers: - name: deployment image: 'registry.ops.openshift.com/openshift3/ose-deployer:v3.4.1.2' env: - name: KUBERNETES_MASTER value: 'https://ip-172-31-10-24.ec2.internal' - name: OPENSHIFT_MASTER value: 'https://ip-172-31-10-24.ec2.internal' - name: BEARER_TOKEN_FILE value: /var/run/secrets/kubernetes.io/serviceaccount/token - name: OPENSHIFT_CA_DATA value: | -----BEGIN CERTIFICATE----- MIIC5jCCAdCgAwIBAgIBATALBgkqhkiG9w0BAQswJjEkMCIGA1UEAwwbb3BlbnNo aWZ0LXNpZ25lckAxNDYzMTU2NTg2MB4XDTE2MDUxMzE2MjMwNloXDTIxMDUxMjE2 MjMwN1owJjEkMCIGA1UEAwwbb3BlbnNoaWZ0LXNpZ25lckAxNDYzMTU2NTg2MIIB IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArp4BlumhbaZiJxnPJPd78jqp scHOa71PnC8Pd/Uzg/cr6kCz8cqFadVpHyAYxR2MVPzwGEjJ2ScP2f5iVby8w10n 408WfAv3HelPCcw5z1yp4pb2WnFNy1eglGl2fQp7Z/Od8TgO2OOpeVvLfxSL/K9V OXYmt9HFnfhO/0c5Cv5T7OJc997h3++006yi/qt0lGTHgeF/eUCmnZ0tosjCRhAS 7AJrYAXN8ERI3s91mrzDMC4q3FjOLlWVa9ZrXeUrbvJYCYgbdtgG2wup2ETy2nFJ 6meeYRYF/7JaVXsOZWkJYfH2K6Lg1wGjFyOXNZkA2jLqOlRMUZWHNnA/DTpL3wID AQABoyMwITAOBgNVHQ8BAf8EBAMCAKQwDwYDVR0TAQH/BAUwAwEB/zALBgkqhkiG 9w0BAQsDggEBADQPZ3eyz2OtWdsxzG//lq1DXguV7T5KUfgp76mkZuDjp5ermC42 m1DjFtEP8HvFTZgz+LYsAIhv7MShe/bZOieHnz4A/vc3oFi6uVrcLffR+CVjdlSP UDKZzOkf7/jTxOzSQImNk3AQAuIeVCcMXF4v4zVRlyMaWcTtOuNGWdEmLZUhUrjT E5Gh+KQOW1jFDYKeZ1RGkAMCL8aD6p7jNvmxVGzQasIleKylDteGblcEdn8M3Xjp hHUVIWnru5CBTwCxCqSXkxMFUsZqSIy+hiMeJPFmkDIdSBb7n2BwgcG0cXu/Zuju 2PKZGzVqvgHhcIlwFZ2g9g1S/SwlVEGUvZs= -----END CERTIFICATE----- - name: OPENSHIFT_DEPLOYMENT_NAME value: beta-2 - name: OPENSHIFT_DEPLOYMENT_NAMESPACE value: divvy resources: limits: cpu: '1' memory: 512Mi requests: cpu: 60m memory: 307Mi volumeMounts: - name: deployer-token-o8eyc readOnly: true mountPath: /var/run/secrets/kubernetes.io/serviceaccount terminationMessagePath: /dev/termination-log imagePullPolicy: Always securityContext: capabilities: drop: - KILL - MKNOD - NET_RAW - SETGID - SETUID - SYS_CHROOT privileged: false seLinuxOptions: level: 's0:c227,c194' runAsUser: 1051690000 restartPolicy: Never terminationGracePeriodSeconds: 10 activeDeadlineSeconds: 3600 dnsPolicy: ClusterFirst nodeSelector: type: compute serviceAccountName: deployer serviceAccount: deployer nodeName: ip-172-31-10-175.ec2.internal securityContext: seLinuxOptions: level: 's0:c227,c194' fsGroup: 1051690000 imagePullSecrets: - name: deployer-dockercfg-vk2yt status: phase: Failed conditions: - type: Initialized status: 'True' lastProbeTime: null lastTransitionTime: '2017-02-10T04:04:26Z' - type: Ready status: 'False' lastProbeTime: null lastTransitionTime: '2017-02-10T04:14:56Z' reason: ContainersNotReady message: 'containers with unready status: [deployment]' - type: PodScheduled status: 'True' lastProbeTime: null lastTransitionTime: '2017-02-10T04:04:26Z' hostIP: 172.31.10.175 podIP: 10.1.102.15 startTime: '2017-02-10T04:04:26Z' containerStatuses: - name: deployment state: terminated: exitCode: 1 reason: Error startedAt: '2017-02-10T04:04:54Z' finishedAt: '2017-02-10T04:14:55Z' containerID: >- docker://2875722a329fe71b7b2eefe416395e0274f8e7aa623d2ec5a17995bf4dc65c9a lastState: {} ready: false restartCount: 0 image: 'registry.ops.openshift.com/openshift3/ose-deployer:v3.4.1.2' imageID: >- docker-pullable://registry.ops.openshift.com/openshift3/ose-deployer@sha256:37adf782e29f09c815ae0bd91299e99ae84e2849b25de100c6581df36c6a7920 containerID: >- docker://2875722a329fe71b7b2eefe416395e0274f8e7aa623d2ec5a17995bf4dc65c9a I don't know how to get PV YAML, is it the same as deployment YAML? 2. I've tried both RWO (Read-Write-Once) & RWX (Read-Write-Many) 3. I am using web interface (as opposed to oc command line) 4. Openshift 3 5. Here is the log from failed pod --> Scaling beta-1 down to zero --> Scaling beta-2 to 1 before performing acceptance check --> Waiting up to 10m0s for pods in deployment beta-2 to become ready W0210 04:14:06.692864 1 reflector.go:330] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:468: watch of *api.Pod ended with: too old resource version: 850302230 (850328358) error: update acceptor rejected beta-2: pods for deployment "beta-2" took longer than 600 seconds to become ready
I don't see any persistent volumes mounted in your pod. What you are seeing is probably a different bug and unrelated to this bug which was specially opened for persistent volumes.
I'm including YAML from deployment which shows persisted volume, hope this helps apiVersion: v1 kind: DeploymentConfig metadata: name: beta namespace: divvy selfLink: /oapi/v1/namespaces/divvy/deploymentconfigs/beta uid: 7d91daff-ef45-11e6-b125-0e3d364e19a5 resourceVersion: '850335639' generation: 5 creationTimestamp: '2017-02-10T04:00:41Z' labels: app: beta annotations: openshift.io/generated-by: OpenShiftWebConsole spec: strategy: type: Recreate recreateParams: timeoutSeconds: 600 rollingParams: updatePeriodSeconds: 1 intervalSeconds: 1 timeoutSeconds: 600 maxUnavailable: 25% maxSurge: 25% resources: {} triggers: - type: ImageChange imageChangeParams: automatic: true containerNames: - beta from: kind: ImageStreamTag namespace: divvy name: 'beta:latest' lastTriggeredImage: >- 172.30.47.227:5000/divvy/beta@sha256:91ed279cee18e4f1ce31ae00a46d49192f7270c2c6253cf129cdfc7f56323e3e - type: ConfigChange replicas: 1 test: false selector: deploymentconfig: beta template: metadata: creationTimestamp: null labels: app: beta deploymentconfig: beta spec: volumes: - name: data persistentVolumeClaim: claimName: data containers: - name: beta image: >- 172.30.47.227:5000/divvy/beta@sha256:91ed279cee18e4f1ce31ae00a46d49192f7270c2c6253cf129cdfc7f56323e3e ports: - containerPort: 8080 protocol: TCP resources: {} volumeMounts: - name: data mountPath: /data terminationMessagePath: /dev/termination-log imagePullPolicy: Always restartPolicy: Always terminationGracePeriodSeconds: 30 dnsPolicy: ClusterFirst securityContext: {} status: latestVersion: 2 observedGeneration: 5 replicas: 1 availableReplicas: 1 details: message: config change causes: - type: ConfigChange conditions: - type: Progressing status: 'False' lastTransitionTime: '2017-02-10T04:14:57Z' reason: ProgressDeadlineExceeded message: Replication controller "beta-2" has failed progressing - type: Available status: 'True' lastTransitionTime: '2017-02-10T04:15:12Z' message: Deployment config has minimum availability.
Okay thank you for the Deployment YAML. But I don't see any errors on this Deployment object you posted. It shows - "availableReplicas:1" and "replicas:1" which means whatever Pods that were required for running the deployment are running correctly. Are you certain that - this Deployment you posted above is not running properly and stuck because of unable to mount volumes? If that is indeed the case - can you also find pods that were created for that deployment (you can do `oc get pods`) and then post output of following commands: ~> `oc describe pod <pod_name_from_above>` ~> `oc logs <pod_name_from_above>`
@budi - If you are still affected with this bug, can you open a new bug with following items: 1. Steps to reproduce 2. Whatever logs you have. (such oc logs) 3. output of oc describe pod - for pod that is stuck. Also, you are saying openshift version as "3". Can you be more specific? There has been lot of fixes between 3.3 and 3.4.
I was able to reproduce the bug consistently before (not sure which version of Openshift), but I wasn't able to reproduce the error today (using Openshift 3.4.0.13)
New bug: https://bugzilla.redhat.com/show_bug.cgi?id=1441602
*** Bug 1441602 has been marked as a duplicate of this bug. ***
Testing on free-int: 1. Create a persistent application with the template mysql-persistent, and create some data on the persistent volume 2. After the pod is ready, scale down the pod to 0 3. Wait for a while, scale up the pod to 1 4. repeat the step 2~3 100 times The pv can be attached successfully.
Wait for online-int env is ready for testing