Bug 1532981 - Resize pv is failed for aws ebs volume
Summary: Resize pv is failed for aws ebs volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.0
Assignee: Hemant Kumar
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-10 07:20 UTC by Chao Yang
Modified: 2018-03-28 14:18 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-28 14:18:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:18:45 UTC

Description Chao Yang 2018-01-10 07:20:19 UTC
Description of problem:
Resize pv is failed for aws ebs volume

Version-Release number of selected component (if applicable):
oc v3.9.0-0.15.0
kubernetes v1.9.0-beta1
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-14-74.ec2.internal:8443
openshift v3.9.0-0.15.0
kubernetes v1.9.0-beta1

How reproducible:
Always

Steps to Reproduce:
1.Create a pod with replicas=1
oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git
2.Create a dynamic pvc
oc create -f https://raw.githubusercontent.com/chao007/v3-testfiles/master/persistent-volumes/ebs/dynamic-provisioning/pvc.yaml

3. Add pvc to the pod
oc volume dc/ruby-ex --add --type=persistentVolumeClaim --mount-path=/opt1 --name=v1 --claim-name=ebsc
4. Update pvc to resize pv 
oc edit pvc ebsc, set the request storage from 4Gi to 8Gi
5.Check pv has the requested size
-bash-4.2# oc get pv pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    kubernetes.io/createdby: aws-ebs-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
  creationTimestamp: 2018-01-10T05:48:37Z
  labels:
    failure-domain.beta.kubernetes.io/region: us-east-1
    failure-domain.beta.kubernetes.io/zone: us-east-1d
  name: pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e
  resourceVersion: "25358"
  selfLink: /api/v1/persistentvolumes/pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e
  uid: e73b122a-f5c9-11e7-92cd-0edfbad2900e
spec:
  accessModes:
  - ReadWriteOnce
  awsElasticBlockStore:
    fsType: ext4
    volumeID: aws://us-east-1d/vol-0f5f736c2be7a26e6
  capacity:
    storage: 8Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: ebsc
    namespace: resize
    resourceVersion: "23764"
    uid: e6a0098e-f5c9-11e7-92cd-0edfbad2900e
  persistentVolumeReclaimPolicy: Delete
  storageClassName: foo
status:
  phase: Bound
6. Delete the pod 
7. New pod status is always in ContainerCreating

Events:
  Type     Reason                  Age                From                                  Message
  ----     ------                  ----               ----                                  -------
  Normal   Scheduled               4m                 default-scheduler                     Successfully assigned ruby-ex-2-pk9j4 to ip-172-18-6-92.ec2.internal
  Normal   SuccessfulMountVolume   4m                 kubelet, ip-172-18-6-92.ec2.internal  MountVolume.SetUp succeeded for volume "default-token-rq4f5"
  Warning  FailedMount             18s (x2 over 2m)   kubelet, ip-172-18-6-92.ec2.internal  Unable to mount volumes for pod "ruby-ex-2-pk9j4_resize(8d0d6496-f5cc-11e7-92cd-0edfbad2900e)": timeout expired waiting for volumes to attach/mount for pod "resize"/"ruby-ex-2-pk9j4". list of unattached/unmounted volumes=[v1]
  Warning  FileSystemResizeFailed  14s (x11 over 4m)  kubelet, ip-172-18-6-92.ec2.internal  MountVolume.resizeFileSystem failed for volume "pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e" (UniqueName: "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0f5f736c2be7a26e6") pod "ruby-ex-2-pk9j4" (UID: "8d0d6496-f5cc-11e7-92cd-0edfbad2900e") : the device /dev/xvdbi is already in use


Actual results:
Resize pv is failed 

Expected results:
Resize pv is ok and pod is running

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Hemant Kumar 2018-01-24 20:29:13 UTC
The root cause of this problem is - when a new pod gets scheduled to same node where old pod was running and it happens quickly enough, then the device is never unmounted from the node. The only thing that does get unmounted and removed is bind mount. The global device path still remains on the node.

But resize operation tries to run fsck prior to resize (because many environments force us to run fsck regardless) and fsck can't be run if device is mounted..

I am pushing a simple fix that will perform online resize if device is mounted, without performing fsck. That should fix this problem. A workaround that will work to solve this problem is - scale the deployment to 0 and then scale back up the deployment.

Comment 2 Hemant Kumar 2018-01-25 06:02:35 UTC
Opened a PR to fix this in upstream - https://github.com/kubernetes/kubernetes/pull/58794

Comment 3 Hemant Kumar 2018-02-02 21:36:39 UTC
https://github.com/openshift/origin/pull/18421

Comment 5 Chao Yang 2018-02-22 05:51:45 UTC
It is passed on 
oc v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-12-213.ec2.internal:8443
openshift v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657

Comment 8 errata-xmlrpc 2018-03-28 14:18:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.