Created attachment 1156272 [details] event log showing error Description of problem: Sometimes a pod is unable to start because its PV is already attached to another instance. I see several occurrences of this happening in dev-preview-int. One of them has been in this state for 2 days so far. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Encountered this error too, in dev-preview-stg. Steps to reproduce: 1. oc login and create project (xxia-proj) 2. Create a dc using https://raw.githubusercontent.com/openshift/origin/master/examples/gitserver/gitserver.yaml Due to https://bugzilla.redhat.com/show_bug.cgi?id=1336318#c1 , prepare pvc's first: $ cat pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: creationTimestamp: null name: mypvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi status: {} $ cat pvc2.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: creationTimestamp: null name: mypvc2 spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi status: {} $ oc create -f pvc.yaml $ oc create -f pvc2.yaml $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mypvc Bound pv-aws-vq862 1Gi RWO 20h mypvc2 Bound pv-aws-3y95l 1Gi RWO 20h $ wget https://raw.githubusercontent.com/openshift/origin/master/examples/gitserver/gitserver.yaml Change gitserver.yaml "volumeMounts" and "volumes" as follows: volumeMounts: - mountPath: /var/lib/git name: git - mountPath: /var/lib/origin name: origin ...... volumes: - name: git persistentVolumeClaim: claimName: mypvc - name: origin persistentVolumeClaim: claimName: mypvc2 Then create it: $ oc create -f gitserver.yaml deploymentconfig "git" created $ oc get pod -l deploymentconfig=git -o wide NAME READY STATUS RESTARTS AGE NODE git-1-6rs6o 1/1 Running 1 19h ip-172-31-9-165.ec2.internal $ oc edit dc git # Trigger re-deployment Check pods scheduled to which nodes: $ oc get pod -l deploymentconfig=git -o wide NAME READY STATUS RESTARTS AGE NODE git-1-6rs6o 1/1 Running 1 19h ip-172-31-9-165.ec2.internal git-2-xn715 0/1 ContainerCreating 0 7m ip-172-31-9-167.ec2.internal Check pod event: $ oc describe pod/git-2-xn715 Name: git-2-xn715 Namespace: xxia-proj Node: ip-172-31-9-167.ec2.internal/172.31.9.167 Start Time: Wed, 18 May 2016 10:26:10 +0800 Labels: deployment=git-2,deploymentconfig=git,run-container=git Status: Pending IP: Controllers: ReplicationController/git-2 Containers: git: Container ID: Image: openshift/origin-gitserver:latest Image ID: Port: 8080/TCP QoS Tier: memory: Burstable cpu: Burstable Limits: cpu: 500m memory: 256Mi Requests: cpu: 30m memory: 153Mi State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment Variables: POD_NAMESPACE: xxia-proj (v1:metadata.namespace) PUBLIC_URL: http://git.$(POD_NAMESPACE).svc.cluster.local:8080 INTERNAL_URL: http://git:8080 GIT_HOME: /var/lib/git HOOK_PATH: /var/lib/git-hooks GENERATE_ARTIFACTS: true DETECTION_SCRIPT: ALLOW_GIT_PUSH: true ALLOW_GIT_HOOKS: true ALLOW_LAZY_CREATE: true ALLOW_ANON_GIT_PULL: true REQUIRE_SERVER_AUTH: - AUTH_NAMESPACE: $(POD_NAMESPACE) REQUIRE_GIT_AUTH: AUTOLINK_KUBECONFIG: - AUTOLINK_NAMESPACE: $(POD_NAMESPACE) AUTOLINK_HOOK: Conditions: Type Status Ready False Volumes: git: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: mypvc ReadOnly: false origin: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: mypvc2 ReadOnly: false git-token-080j1: Type: Secret (a volume populated by a Secret) SecretName: git-token-080j1 Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 8m 8m 1 {default-scheduler } Normal Scheduled Successfully assigned git-2-xn715 to ip-172-31-9-167.ec2.internal 7m 13s 7 {kubelet ip-172-31-9-167.ec2.internal} Warning FailedMount Unable to mount volumes for pod "git-2-xn715_xxia-proj(e28c4344-1c9f-11e6-ae12-0ee251450653)": Could not attach EBS Disk "aws://us-east-1c/vol-9e9d393b": Error attaching EBS volume: VolumeInUse: vol-9e9d393b is already attached to an instance status code: 400, request id: 7m 13s 7 {kubelet ip-172-31-9-167.ec2.internal} Warning FailedSync Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-9e9d393b": Error attaching EBS volume: VolumeInUse: vol-9e9d393b is already attached to an instance status code: 400, request id: Finally re-deployment failed: $ oc get pod git-1-6rs6o 1/1 Running 1 20h git-2-deploy 0/1 Error 0 26m
Relevant bug 1329040
Tried to investigate the issue with @xxia, we found the possible reason is: 1. `oc edit dc` triggered a re-deployment 2. The original pod was deleted, new pod was scheduled to another node whereas the volume was still attached to the previous instance.
Should be addressed by https://github.com/kubernetes/kubernetes/pull/25502 and https://github.com/kubernetes/kubernetes/pull/25888
This has been merged and is in OSE v3.3.0.9 or newer.
This is failed on openshift v3.3.0.9 kubernetes v1.3.0+57fb9ac etcd 2.3.0+git Step is as below: 1. Create a PV oc get pv -o yaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: 2016-07-25T05:45:25Z labels: failure-domain.beta.kubernetes.io/region: us-east-1 failure-domain.beta.kubernetes.io/zone: us-east-1d type: local name: ebs resourceVersion: "3977" selfLink: /api/v1/persistentvolumes/ebs uid: fc06fb71-522a-11e6-bf9c-0ef1eb2be359 spec: accessModes: - ReadWriteOnce awsElasticBlockStore: fsType: ext4 volumeID: aws://us-east-1d/vol-2f40058b capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: ebs namespace: chao resourceVersion: "3975" uid: b9a3d667-522b-11e6-bf9c-0ef1eb2be359 persistentVolumeReclaimPolicy: Retain status: phase: Bound kind: List metadata: {} 2. Create a pvc in the namespace chao 3. wget https://raw.githubusercontent.com/openshift/origin/master/examples/deployment/recreate-example.yaml Add below in this file volumeMounts: - mountPath: /var/lib/test name: test volumes: - name: test persistentVolumeClaim: claimName: ebs 4. oc create -f recreate-example.yaml 5. Pod is running NAME READY STATUS RESTARTS AGE NODE recreate-example-1-xufes 1/1 Running 0 9m ip-172-18-0-79.ec2.internal 6. oadm manage-node ip-172-18-0-79.ec2.internal --schedulable=false 7. edit dc from recreate-example:latest to recreate-example:v1 , to trigger re-deployment from: kind: ImageStreamTag name: recreate-example:v1 [root@dhcp-128-8 ~]# oc status In project chao on server https://ec2-52-90-208-19.compute-1.amazonaws.com:443 http://recreate-example-chao.0725-hu0.qe.rhcloud.com (svc/recreate-example) dc/recreate-example deploys istag/recreate-example:v1 deployment #2 running for 2 minutes - 1 pod deployment #1 deployed about an hour ago 8. Check pod status [root@dhcp-128-8 ~]# oc describe pods recreate-example-2-a00r3 Name: recreate-example-2-a00r3 Namespace: chao Node: ip-172-18-9-202.ec2.internal/172.18.9.202 Start Time: Mon, 25 Jul 2016 15:40:49 +0800 Labels: deployment=recreate-example-2,deploymentconfig=recreate-example Status: Pending IP: Controllers: ReplicationController/recreate-example-2 Containers: deployment-example: Container ID: Image: openshift/deployment-example@sha256:c505b916f7e5143a356ff961f2c21aee40fbd2cd906c1e3feeb8d5e978da284b Image ID: Port: 8080/TCP QoS Tier: cpu: BestEffort memory: BestEffort State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment Variables: Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: test: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: ebs ReadOnly: false default-token-6p2xr: Type: Secret (a volume populated by a Secret) SecretName: default-token-6p2xr Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 11m 11m 1 {default-scheduler } Normal Scheduled Successfully assigned recreate-example-2-a00r3 to ip-172-18-9-202.ec2.internal 9m 17s 5 {kubelet ip-172-18-9-202.ec2.internal} Warning FailedMount Unable to mount volumes for pod "recreate-example-2-a00r3_chao(1b242d80-523b-11e6-bf9c-0ef1eb2be359)": timeout expired waiting for volumes to attach/mount for pod "recreate-example-2-a00r3"/"chao". list of unattached/unmounted volumes=[test] 9m 17s 5 {kubelet ip-172-18-9-202.ec2.internal} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "recreate-example-2-a00r3"/"chao". list of unattached/unmounted volumes=[test]
"oc describe pods" output doesn't indicate the EBS volume is attached to the wrong node. Can you get the openshift node log or kubelet log from ip-172-18-9-202.ec2.internal?
First, the ebs volume is mounted to the node ip-172-18-0-79.ec2.internal, and after re-deploy the dc, the ebs volume should mount to the node ip-172-18-9-202.ec2.internal. But the ebs volume still attached to the node ip-172-18-0-79.ec2.internal Please see the detailed info in the end of this https://github.com/kubernetes/kubernetes/issues/28671 ,
per latest comment[1], is the issue resolved? 1. https://github.com/kubernetes/kubernetes/issues/28671#issuecomment-240039479
I could not reproduce the wrong info on OCP right now.
Thanks. I close it for now. if it is still a problem, please reopen it.