Bug 1988373

Summary: GCE PD: Mounting XFS volume clone or restored snapshot to same node failed
Product: OpenShift Container Platform Reporter: Jan Safranek <jsafrane>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: Kubernetes External Components QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, chaoyang, jsafrane, wduan
Version: 4.8   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1965155
: 2052955 (view as bug list) Environment:
Last Closed: 2021-10-18 17:43:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1965155    
Bug Blocks: 2052955    

Description Jan Safranek 2021-07-30 12:13:43 UTC
+++ This bug was initially created as a clone of Bug #1965155 +++

Description of problem:
Moutning XFS restored snapshot volume to same node is failed
Upstream issue here: https://github.com/container-storage-interface/spec/issues/482

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-05-25-223219 

How reproducible:
Always

Steps to Reproduce:
1.Create pvc/pod using below storageclass:
oc get sc foo -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  creationTimestamp: "2021-05-27T01:55:45Z"
  name: foo
  resourceVersion: "536907"
  uid: 8d77991b-6b96-4995-b044-8b232168712e
parameters:
  fsType: xfs
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

2.Create volumesnapshot
oc get volumesnapshot
NAME                  READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
new-snapshot-test-1   true         pvc1                                1Gi           gcp-snap-2      snapcontent-55a1ee74-6791-4506-aae7-50a093b8cb20   45m            45m

3.Create restored pvc/pod. Make sure pod is scheduled to same node.
Events:
  Type     Reason                  Age              From                     Message
  ----     ------                  ----             ----                     -------
  Normal   Scheduled               13s              default-scheduler        Successfully assigned test/pod2 to ip-10-0-169-208.us-east-2.compute.internal
  Normal   SuccessfulAttachVolume  11s              attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-b9076058-77c4-4496-a9be-22b7bfcddae7"
  Warning  FailedMount             2s (x4 over 5s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-b9076058-77c4-4496-a9be-22b7bfcddae7" : rpc error: code = Internal desc = could not format "/dev/nvme3n1" and mount it at "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount": mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/nvme3n1 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount: wrong fs type, bad option, bad superblock on /dev/nvme3n1, missing codepage or helper program, or other error.


Actual results:
Restored pod is failed due to xfs volume could not mount.

Expected results:
Restored pod should be running.
Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

--- Additional comment from Jan Safranek on 2021-06-03 14:13:15 UTC ---

This affects also cloned volumes. Since the root cause (and the fix) are the same, let's track them in a single bug.

--- Additional comment from Jan Safranek on 2021-06-03 14:14:11 UTC ---

Comment 2 Jan Safranek 2021-07-30 15:01:43 UTC
Upstream PR: https://github.com/kubernetes/cloud-provider-openstack/pull/1614/files

Comment 3 Jan Safranek 2021-07-30 15:02:24 UTC
Sorry, scratch the previous comment, that PR is for Cinder CSI driver.

Comment 5 Wei Duan 2021-08-20 08:33:03 UTC
Verified pass on 4.9.0-0.nightly-2021-08-19-184748

Comment 8 errata-xmlrpc 2021-10-18 17:43:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759