Description of problem: Resize pv is failed for aws ebs volume Version-Release number of selected component (if applicable): oc v3.9.0-0.15.0 kubernetes v1.9.0-beta1 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-14-74.ec2.internal:8443 openshift v3.9.0-0.15.0 kubernetes v1.9.0-beta1 How reproducible: Always Steps to Reproduce: 1.Create a pod with replicas=1 oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git 2.Create a dynamic pvc oc create -f https://raw.githubusercontent.com/chao007/v3-testfiles/master/persistent-volumes/ebs/dynamic-provisioning/pvc.yaml 3. Add pvc to the pod oc volume dc/ruby-ex --add --type=persistentVolumeClaim --mount-path=/opt1 --name=v1 --claim-name=ebsc 4. Update pvc to resize pv oc edit pvc ebsc, set the request storage from 4Gi to 8Gi 5.Check pv has the requested size -bash-4.2# oc get pv pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e -o yaml apiVersion: v1 kind: PersistentVolume metadata: annotations: kubernetes.io/createdby: aws-ebs-dynamic-provisioner pv.kubernetes.io/bound-by-controller: "yes" pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs creationTimestamp: 2018-01-10T05:48:37Z labels: failure-domain.beta.kubernetes.io/region: us-east-1 failure-domain.beta.kubernetes.io/zone: us-east-1d name: pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e resourceVersion: "25358" selfLink: /api/v1/persistentvolumes/pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e uid: e73b122a-f5c9-11e7-92cd-0edfbad2900e spec: accessModes: - ReadWriteOnce awsElasticBlockStore: fsType: ext4 volumeID: aws://us-east-1d/vol-0f5f736c2be7a26e6 capacity: storage: 8Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: ebsc namespace: resize resourceVersion: "23764" uid: e6a0098e-f5c9-11e7-92cd-0edfbad2900e persistentVolumeReclaimPolicy: Delete storageClassName: foo status: phase: Bound 6. Delete the pod 7. New pod status is always in ContainerCreating Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m default-scheduler Successfully assigned ruby-ex-2-pk9j4 to ip-172-18-6-92.ec2.internal Normal SuccessfulMountVolume 4m kubelet, ip-172-18-6-92.ec2.internal MountVolume.SetUp succeeded for volume "default-token-rq4f5" Warning FailedMount 18s (x2 over 2m) kubelet, ip-172-18-6-92.ec2.internal Unable to mount volumes for pod "ruby-ex-2-pk9j4_resize(8d0d6496-f5cc-11e7-92cd-0edfbad2900e)": timeout expired waiting for volumes to attach/mount for pod "resize"/"ruby-ex-2-pk9j4". list of unattached/unmounted volumes=[v1] Warning FileSystemResizeFailed 14s (x11 over 4m) kubelet, ip-172-18-6-92.ec2.internal MountVolume.resizeFileSystem failed for volume "pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e" (UniqueName: "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0f5f736c2be7a26e6") pod "ruby-ex-2-pk9j4" (UID: "8d0d6496-f5cc-11e7-92cd-0edfbad2900e") : the device /dev/xvdbi is already in use Actual results: Resize pv is failed Expected results: Resize pv is ok and pod is running Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
The root cause of this problem is - when a new pod gets scheduled to same node where old pod was running and it happens quickly enough, then the device is never unmounted from the node. The only thing that does get unmounted and removed is bind mount. The global device path still remains on the node. But resize operation tries to run fsck prior to resize (because many environments force us to run fsck regardless) and fsck can't be run if device is mounted.. I am pushing a simple fix that will perform online resize if device is mounted, without performing fsck. That should fix this problem. A workaround that will work to solve this problem is - scale the deployment to 0 and then scale back up the deployment.
Opened a PR to fix this in upstream - https://github.com/kubernetes/kubernetes/pull/58794
https://github.com/openshift/origin/pull/18421
It is passed on oc v3.9.0-0.47.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-12-213.ec2.internal:8443 openshift v3.9.0-0.47.0 kubernetes v1.9.1+a0ce1bc657
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489