Bug 1988373

Summary:	GCE PD: Mounting XFS volume clone or restored snapshot to same node failed
Product:	OpenShift Container Platform	Reporter:	Jan Safranek <jsafrane>
Component:	Storage	Assignee:	Jan Safranek <jsafrane>
Storage sub component:	Kubernetes External Components	QA Contact:	Wei Duan <wduan>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aos-bugs, chaoyang, jsafrane, wduan
Version:	4.8
Target Milestone:	---
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:	1965155
Clones:	2052955 (view as bug list)		Environment:
Last Closed:	2021-10-18 17:43:22 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1965155
Bug Blocks:	2052955

Description Jan Safranek 2021-07-30 12:13:43 UTC

+++ This bug was initially created as a clone of Bug #1965155 +++

Description of problem:
Moutning XFS restored snapshot volume to same node is failed
Upstream issue here: https://github.com/container-storage-interface/spec/issues/482

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-05-25-223219 

How reproducible:
Always

Steps to Reproduce:
1.Create pvc/pod using below storageclass:
oc get sc foo -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  creationTimestamp: "2021-05-27T01:55:45Z"
  name: foo
  resourceVersion: "536907"
  uid: 8d77991b-6b96-4995-b044-8b232168712e
parameters:
  fsType: xfs
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

2.Create volumesnapshot
oc get volumesnapshot
NAME                  READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
new-snapshot-test-1   true         pvc1                                1Gi           gcp-snap-2      snapcontent-55a1ee74-6791-4506-aae7-50a093b8cb20   45m            45m

3.Create restored pvc/pod. Make sure pod is scheduled to same node.
Events:
  Type     Reason                  Age              From                     Message
  ----     ------                  ----             ----                     -------
  Normal   Scheduled               13s              default-scheduler        Successfully assigned test/pod2 to ip-10-0-169-208.us-east-2.compute.internal
  Normal   SuccessfulAttachVolume  11s              attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-b9076058-77c4-4496-a9be-22b7bfcddae7"
  Warning  FailedMount             2s (x4 over 5s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-b9076058-77c4-4496-a9be-22b7bfcddae7" : rpc error: code = Internal desc = could not format "/dev/nvme3n1" and mount it at "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount": mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/nvme3n1 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b9076058-77c4-4496-a9be-22b7bfcddae7/globalmount: wrong fs type, bad option, bad superblock on /dev/nvme3n1, missing codepage or helper program, or other error.


Actual results:
Restored pod is failed due to xfs volume could not mount.

Expected results:
Restored pod should be running.
Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

--- Additional comment from Jan Safranek on 2021-06-03 14:13:15 UTC ---

This affects also cloned volumes. Since the root cause (and the fix) are the same, let's track them in a single bug.

--- Additional comment from Jan Safranek on 2021-06-03 14:14:11 UTC ---

Comment 1 Jan Safranek 2021-07-30 12:14:54 UTC

GCP PD CSI driver has been fixed upstream in ver. 1.3.0: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/CHANGELOG/CHANGELOG-1.3.md, https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/788

Comment 2 Jan Safranek 2021-07-30 15:01:43 UTC

Upstream PR: https://github.com/kubernetes/cloud-provider-openstack/pull/1614/files

Comment 3 Jan Safranek 2021-07-30 15:02:24 UTC

Sorry, scratch the previous comment, that PR is for Cinder CSI driver.

Comment 5 Wei Duan 2021-08-20 08:33:03 UTC

Verified pass on 4.9.0-0.nightly-2021-08-19-184748

Comment 8 errata-xmlrpc 2021-10-18 17:43:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759