1532981 – Resize pv is failed for aws ebs volume

Bug 1532981 - Resize pv is failed for aws ebs volume

Summary: Resize pv is failed for aws ebs volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.9.0
Assignee:	Hemant Kumar
QA Contact:	Chao Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-10 07:20 UTC by Chao Yang
Modified:	2018-03-28 14:18 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-03-28 14:18:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:0489	0	None	None	None	2018-03-28 14:18:45 UTC

Description Chao Yang 2018-01-10 07:20:19 UTC

Description of problem:
Resize pv is failed for aws ebs volume

Version-Release number of selected component (if applicable):
oc v3.9.0-0.15.0
kubernetes v1.9.0-beta1
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-14-74.ec2.internal:8443
openshift v3.9.0-0.15.0
kubernetes v1.9.0-beta1

How reproducible:
Always

Steps to Reproduce:
1.Create a pod with replicas=1
oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git
2.Create a dynamic pvc
oc create -f https://raw.githubusercontent.com/chao007/v3-testfiles/master/persistent-volumes/ebs/dynamic-provisioning/pvc.yaml

3. Add pvc to the pod
oc volume dc/ruby-ex --add --type=persistentVolumeClaim --mount-path=/opt1 --name=v1 --claim-name=ebsc
4. Update pvc to resize pv 
oc edit pvc ebsc, set the request storage from 4Gi to 8Gi
5.Check pv has the requested size
-bash-4.2# oc get pv pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    kubernetes.io/createdby: aws-ebs-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
  creationTimestamp: 2018-01-10T05:48:37Z
  labels:
    failure-domain.beta.kubernetes.io/region: us-east-1
    failure-domain.beta.kubernetes.io/zone: us-east-1d
  name: pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e
  resourceVersion: "25358"
  selfLink: /api/v1/persistentvolumes/pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e
  uid: e73b122a-f5c9-11e7-92cd-0edfbad2900e
spec:
  accessModes:
  - ReadWriteOnce
  awsElasticBlockStore:
    fsType: ext4
    volumeID: aws://us-east-1d/vol-0f5f736c2be7a26e6
  capacity:
    storage: 8Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: ebsc
    namespace: resize
    resourceVersion: "23764"
    uid: e6a0098e-f5c9-11e7-92cd-0edfbad2900e
  persistentVolumeReclaimPolicy: Delete
  storageClassName: foo
status:
  phase: Bound
6. Delete the pod 
7. New pod status is always in ContainerCreating

Events:
  Type     Reason                  Age                From                                  Message
  ----     ------                  ----               ----                                  -------
  Normal   Scheduled               4m                 default-scheduler                     Successfully assigned ruby-ex-2-pk9j4 to ip-172-18-6-92.ec2.internal
  Normal   SuccessfulMountVolume   4m                 kubelet, ip-172-18-6-92.ec2.internal  MountVolume.SetUp succeeded for volume "default-token-rq4f5"
  Warning  FailedMount             18s (x2 over 2m)   kubelet, ip-172-18-6-92.ec2.internal  Unable to mount volumes for pod "ruby-ex-2-pk9j4_resize(8d0d6496-f5cc-11e7-92cd-0edfbad2900e)": timeout expired waiting for volumes to attach/mount for pod "resize"/"ruby-ex-2-pk9j4". list of unattached/unmounted volumes=[v1]
  Warning  FileSystemResizeFailed  14s (x11 over 4m)  kubelet, ip-172-18-6-92.ec2.internal  MountVolume.resizeFileSystem failed for volume "pvc-e6a0098e-f5c9-11e7-92cd-0edfbad2900e" (UniqueName: "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0f5f736c2be7a26e6") pod "ruby-ex-2-pk9j4" (UID: "8d0d6496-f5cc-11e7-92cd-0edfbad2900e") : the device /dev/xvdbi is already in use


Actual results:
Resize pv is failed 

Expected results:
Resize pv is ok and pod is running

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Hemant Kumar 2018-01-24 20:29:13 UTC

The root cause of this problem is - when a new pod gets scheduled to same node where old pod was running and it happens quickly enough, then the device is never unmounted from the node. The only thing that does get unmounted and removed is bind mount. The global device path still remains on the node.

But resize operation tries to run fsck prior to resize (because many environments force us to run fsck regardless) and fsck can't be run if device is mounted..

I am pushing a simple fix that will perform online resize if device is mounted, without performing fsck. That should fix this problem. A workaround that will work to solve this problem is - scale the deployment to 0 and then scale back up the deployment.

Comment 2 Hemant Kumar 2018-01-25 06:02:35 UTC

Opened a PR to fix this in upstream - https://github.com/kubernetes/kubernetes/pull/58794

Comment 3 Hemant Kumar 2018-02-02 21:36:39 UTC

https://github.com/openshift/origin/pull/18421

Comment 5 Chao Yang 2018-02-22 05:51:45 UTC

It is passed on 
oc v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-12-213.ec2.internal:8443
openshift v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657

Comment 8 errata-xmlrpc 2018-03-28 14:18:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.