1988371 – AWS EBS: Mounting XFS volume clone or restored snapshot to same node failed

Bug 1988371 - AWS EBS: Mounting XFS volume clone or restored snapshot to same node failed

Summary: AWS EBS: Mounting XFS volume clone or restored snapshot to same node failed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Jan Safranek
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:	1965155
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-30 12:09 UTC by Jan Safranek
Modified:	2021-10-18 17:43 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1965155
Environment:
Last Closed:	2021-10-18 17:43:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift aws-ebs-csi-driver pull 190	0	None	None	None	2021-08-18 13:40:18 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:43:43 UTC

Description Jan Safranek 2021-07-30 12:09:22 UTC

+++ This bug was initially created as a clone of Bug #1965155 +++

Description of problem:
Moutning XFS restored snapshot volume to same node is failed
Upstream issue here: https://github.com/container-storage-interface/spec/issues/482

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-05-25-223219 

How reproducible:
Always

Steps to Reproduce:
1.Create pvc/pod using below storageclass:
oc get sc foo -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  creationTimestamp: "2021-05-27T01:55:45Z"
  name: foo
  resourceVersion: "536907"
  uid: 8d77991b-6b96-4995-b044-8b232168712e
parameters:
  fsType: xfs
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

2.Create volumesnapshot
oc get volumesnapshot
NAME                  READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
new-snapshot-test-1   true         pvc1                                1Gi           gcp-snap-2      snapcontent-55a1ee74-6791-4506-aae7-50a093b8cb20   45m            45m

3.Create restored pvc/pod. Make sure pod is scheduled to same node.
Events:
  Type     Reason                  Age              From                     Message
  ----     ------                  ----             ----                     -------
  Normal   Scheduled               13s              default-scheduler        Successfully assigned test/pod2 to ip-10-0-169-208.us-east-2.compute.internal
  Normal   SuccessfulAttachVolume  11s              attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-b9076058-77c4-4496-a9be-22b7bfcddae7"
  Warning  FailedMount             2s (x4 over 5s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-b9076058-77c4-4496-a9be-22b7bfcddae7" : rpc error: code = Internal desc = could not format "/dev/nvme3n1" and mount it at "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount": mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/nvme3n1 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount: wrong fs type, bad option, bad superblock on /dev/nvme3n1, missing codepage or helper program, or other error.


Actual results:
Restored pod is failed due to xfs volume could not mount.

Expected results:
Restored pod should be running.
Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

--- Additional comment from Jan Safranek on 2021-06-03 14:13:15 UTC ---

This affects also cloned volumes. Since the root cause (and the fix) are the same, let's track them in a single bug.

--- Additional comment from Jan Safranek on 2021-06-03 14:14:11 UTC ---

Comment 1 Jan Safranek 2021-07-30 12:10:13 UTC

AWS EBS CSI driver has been fixed in upstream release 1.2.0, https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/913

Comment 3 Wei Duan 2021-08-20 09:06:45 UTC

Verified pass on 4.9.0-0.nightly-2021-08-19-184748

$ oc get pod -o wide
NAME          READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
mypod-ori     1/1     Running   0          20m     10.129.2.22   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
mypod-res     1/1     Running   0          6m36s   10.129.2.29   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
mypod-res-1   1/1     Running   0          4m45s   10.129.2.33   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>
mypod-res-2   1/1     Running   0          4m44s   10.129.2.34   ip-10-0-217-234.us-east-2.compute.internal   <none>           <none>

$ oc rsh mypod-res-2
sh-4.4# /mnt/local/hello 
Hello OpenShift Storage
sh-4.4# mount | grep local
/dev/nvme4n1 on /mnt/local type xfs (rw,relatime,seclabel,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Comment 6 errata-xmlrpc 2021-10-18 17:43:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.