Bug 1968253
| Summary: | GCP CSI driver can provision volume with access mode ROX | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Chao Yang <chaoyang> |
| Component: | Storage | Assignee: | Tomas Smetana <tsmetana> |
| Storage sub component: | Storage | QA Contact: | Chao Yang <chaoyang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | jsafrane, tsmetana, wduan |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Release Note | |
| Doc Text: |
The provisioner sidecar now has an argument called `controller-publish-readonly` which sets the value of CSI PV spec `readonly` field value based on the PVC access mode. If this flag is set to `true` and the PVC access mode only contains the `ROX` access mode, the controller automatically sets `PersistentVolume.spec.CSIPersistentVolumeSource.readOnly` field to `true`.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:36:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
User asked for empty ReadOnlyMany volume and user got it :-). It's not very useful, but user may e.g. restore a snapshot there. @Chao, can you check the volume is really read-only? rw mount option is odd, but it can be still attached as read only. If it's writable we need to fix it. Hi @jsafrane, We can write data to this volume oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE myclaim Bound pvc-d39ff6af-d4b8-4ad1-a63c-ba307ae2ec5b 2Gi ROX standard-csi 9m17s oc exec pod4 -ti -- bash [root@pod4 /]# ls /tmp1 lost+found test ls -lrt /var/lib/kubelet/pods/14e28cf2-8e11-4ea7-8595-4a3abb21c7e1/volumes/kubernetes.io~csi/vc-d39ff6af-d4b8-4ad1-a63c-ba307ae2ec5b/mount total 16 drwx------. 2 root root 16384 Jun 9 08:16 lost+found -rw-r--r--. 1 root root 0 Jun 9 08:16 test Something in the cluster (kubelet? GCP CSI driver?) "forgets" to mount ReadOnlyMany volume as read only. Mustafa, reproduce the issue, and check logs of the CSI driver - how was NodeStage/NodePublish called? Their VolumeCapability.AccessMode should be MULTI_NODE_READER_ONLY and then the driver should mount the volume as read-only, in theory. https://github.com/container-storage-interface/spec/blob/486e6bdb2d5d814befb1d11744c39a33842af15f/csi.proto#L427 In addition, if you dynamically provision an empty ReadOnlyMany volume, the CSI driver should not even format the volume with ext4, it should be really read only and fail mounting it. It should succeed when you restore a snapshot of already formatted volume as a new PVC. There is a an issue & PR upstream regarding this issue : https://github.com/kubernetes/kubernetes/issues/70505 PR : https://github.com/kubernetes-csi/external-provisioner/pull/469 This should have been fixed with rebase of the external CSI provisioner in OCP to version 3.0.0: moving manually to MODIFIED. Failed on
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2021-09-23-210724 True False 3h5m Cluster version is 4.10.0-0.nightly-2021-09-23-210724
oc describe pv
Name: pvc-97960186-529f-44a7-b887-ee13703f4395
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: pd.csi.storage.gke.io
Finalizers: [kubernetes.io/pv-protection external-attacher/pd-csi-storage-gke-io]
StorageClass: standard-csi
Status: Bound
Claim: default/myclaim1
Reclaim Policy: Delete
Access Modes: ROX
VolumeMode: Filesystem
Capacity: 2Gi
Node Affinity:
Required Terms:
Term 0: topology.gke.io/zone in [us-central1-c]
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: pd.csi.storage.gke.io
FSType: ext4
VolumeHandle: projects/openshift-qe/zones/us-central1-c/disks/pvc-97960186-529f-44a7-b887-ee13703f4395
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1632713817264-8081-pd.csi.storage.gke.io
Events: <none>
/dev/sdb on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-97960186-529f-44a7-b887-ee13703f4395/globalmount type ext4 (rw,relatime,seclabel)
/dev/sdb on /var/lib/kubelet/pods/e447caa8-bd4b-48ed-9e04-cfc022e0568d/volumes/kubernetes.io~csi/pvc-97960186-529f-44a7-b887-ee13703f4395/mount type ext4 (rw,relatime,seclabel)
---
It is correct when try to provision and mount ro volumes.
Warning FailedMount 16s (x7 over 51s) kubelet MountVolume.MountDevice failed for volume "pvc-e4269d3a-3a2d-4920-8917-a30c0a0773e7" : rpc error: code = Internal desc = Failed to format and mount device from ("/dev/disk/by-id/google-pvc-e4269d3a-3a2d-4920-8917-a30c0a0773e7") to ("/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-e4269d3a-3a2d-4920-8917-a30c0a0773e7/globalmount") with fstype ("ext4") and options ([]): format of disk "/dev/disk/by-id/google-pvc-e4269d3a-3a2d-4920-8917-a30c0a0773e7" failed: type:("ext4") target:("/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-e4269d3a-3a2d-4920-8917-a30c0a0773e7/globalmount") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.45.6 (20-Mar-2020)
/dev/disk/by-id/google-pvc-e4269d3a-3a2d-4920-8917-a30c0a0773e7: Read-only file system while setting up superblock
)
---
1.Create rwo pvc/pod
2.Create snapshotclass
3.Create volumesnapshot
oc get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
new-snapshot-test-1 true myclaim1 1Gi gcp-snap-2 snapcontent-320c7f7e-d651-47e7-a448-faaafa88b60b 3h10m 3h10m
4.Create restore pvc with rox
oc get pvc/pvc1-restore -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
volume.kubernetes.io/selected-node: qe-chao-bug-4hclb-worker-c-gzlvv.c.openshift-qe.internal
creationTimestamp: "2021-11-16T06:42:24Z"
finalizers:
- kubernetes.io/pvc-protection
name: pvc1-restore
namespace: test1
resourceVersion: "86025"
uid: d06b7fe6-87ef-4d5f-8866-841416c66c3e
spec:
accessModes:
- ReadOnlyMany
dataSource:
apiGroup: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: new-snapshot-test-1
resources:
requests:
storage: 1Gi
storageClassName: standard-csi
volumeMode: Filesystem
volumeName: pvc-d06b7fe6-87ef-4d5f-8866-841416c66c3e
status:
accessModes:
- ReadOnlyMany
capacity:
storage: 1Gi
phase: Bound
5.oc get pods
NAME READY STATUS RESTARTS AGE
pod-restore 0/1 CreateContainerError 0 177m
pod1 1/1 Running 0 3h10m
oc describe pods/pod-restore
Warning FailedMount 3m18s kubelet Unable to attach or mount volumes: unmounted volumes=[aws1], unattached volumes=[aws1 kube-api-access-rj8fz]: timed out waiting for the condition
Warning FailedMount 64s (x2 over 5m34s) kubelet Unable to attach or mount volumes: unmounted volumes=[aws1], unattached volumes=[kube-api-access-rj8fz aws1]: timed out waiting for the condition
Warning FailedMount 63s (x11 over 7m21s) kubelet MountVolume.MountDevice failed for volume "pvc-d06b7fe6-87ef-4d5f-8866-841416c66c3e" : rpc error: code = Internal desc = Failed to format and mount device from ("/dev/disk/by-id/google-pvc-d06b7fe6-87ef-4d5f-8866-841416c66c3e") to ("/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d06b7fe6-87ef-4d5f-8866-841416c66c3e/globalmount") with fstype ("ext4") and options ([]): mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/disk/by-id/google-pvc-d06b7fe6-87ef-4d5f-8866-841416c66c3e /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d06b7fe6-87ef-4d5f-8866-841416c66c3e/globalmount
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d06b7fe6-87ef-4d5f-8866-841416c66c3e/globalmount: cannot mount /dev/sdd read-only.
6.Tried on the node with `noload`, seems can mount to the node.
mount -o ro,noload /dev/disk/by-id/google-pvc-d06b7fe6-87ef-4d5f-8866-841416c66c3e /mnt/test/
ls /mnt/test/
lost+found test
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2021-11-12-161948 True False 6h29m Cluster version is 4.10.0-0.nightly-2021-11-12-161948
I will try to reporoduce: do you have a spec for the pod-restore? I think it also needs to request a read-only mount for this thing to work. I'm not sure the CSI to kubernetes volume mode mapping is complete and correct: I filed also https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/872 upstream and will keep tinkering... This requires rebase to upstream driver v1.4.0, we have 1.3.4 in 4.10. It is ok that could not provision rox volume. Warning ProvisioningFailed 2s (x5 over 17s) pd.csi.storage.gke.io_qe-chaoyang66-gvpz2-master-0.c.openshift-qe.internal_9e3b8511-f5ed-4d40-b7d3-4cc18a0140ab failed to provision volume with StorageClass "standard-csi": rpc error: code = InvalidArgument desc = VolumeContentSource must be provided when AccessMode is set to read only 1.Create pvc/pod
2.Write some data into mounted volume
oc exec pod1 -- ls -lrt /tmp1
total 4
-r--r--r--. 1 root root 13 Jun 8 11:10 test
oc exec pod1 -- ls -lrt / | grep tmp1
dr--r--r--. 2 root root 4096 Jun 8 11:11 tmp1
3.Create volumesnapshot
4.Create restored pvc
oc get pvc pvc2-restore -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
volume.kubernetes.io/selected-node: evakhoni-85461-2r9t4-worker-a-jmqtm.c.openshift-qe.internal
volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
creationTimestamp: "2022-06-08T11:15:22Z"
finalizers:
- kubernetes.io/pvc-protection
name: pvc2-restore
namespace: default
resourceVersion: "101081"
uid: 02762e9f-58a2-41c2-925c-478b933884a7
spec:
accessModes:
- ReadOnlyMany
dataSource:
apiGroup: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: new-snapshot-test-1
resources:
requests:
storage: 2Gi
storageClassName: standard-csi
volumeMode: Filesystem
volumeName: pvc-02762e9f-58a2-41c2-925c-478b933884a7
status:
accessModes:
- ReadOnlyMany
capacity:
storage: 2Gi
phase: Bound
5.Create pod but container is error
pod2 0/1 CreateContainerError 0 7m11s
oc describe pod2
Warning Failed 5m13s (x12 over 7m13s) kubelet Error: relabel failed /var/lib/kubelet/pods/a101adb3-4505-4375-835d-17b178ef7a01/volumes/kubernetes.io~csi/pvc-02762e9f-58a2-41c2-925c-478b933884a7/mount: lsetxattr /var/lib/kubelet/pods/a101adb3-4505-4375-835d-17b178ef7a01/volumes/kubernetes.io~csi/pvc-02762e9f-58a2-41c2-925c-478b933884a7/mount: read-only file system
@tsmetana can you help to check it?
Hello. This is what I got on 4.11.0-0.ci-2022-06-06-185917:
Restored PVC:
$ oc get pvc -o yaml
apiVersion: v1
items:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
volume.kubernetes.io/selected-node: ci-ln-xkgf6w2-72292-j2hkc-worker-a-z94nq
volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
creationTimestamp: "2022-06-08T13:44:20Z"
finalizers:
- kubernetes.io/pvc-protection
name: pvc1-restore
namespace: default
resourceVersion: "33700"
uid: 915ba939-95b7-4f0c-970e-a4487068113a
spec:
accessModes:
- ReadOnlyMany
dataSource:
apiGroup: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: mysnap-1
dataSourceRef:
apiGroup: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: mysnap-1
resources:
requests:
storage: 1Gi
storageClassName: standard-csi
volumeMode: Filesystem
volumeName: pvc-915ba939-95b7-4f0c-970e-a4487068113a
status:
accessModes:
- ReadOnlyMany
capacity:
storage: 1Gi
phase: Bound
kind: List
metadata:
resourceVersion: ""
selfLink: ""
The events from the pod using the PVC:
$ oc describe pod pod-restore
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16s default-scheduler Successfully assigned default/pod-restore to ci-ln-xkgf6w2-72292-j2hkc-worker-a-z94nq by ci-ln-xkgf6w2-72292-j2hkc-master-0
Normal SuccessfulAttachVolume 7s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-915ba939-95b7-4f0c-970e-a4487068113a"
Warning FileSystemResizeFailed 6s kubelet MountVolume.NodeExpandVolume failed for volume "pvc-915ba939-95b7-4f0c-970e-a4487068113a" requested read-only file system
Normal AddedInterface 4s multus Add eth0 [10.131.0.20/23] from openshift-sdn
Normal Pulling 4s kubelet Pulling image "gcr.io/google_containers/busybox"
Normal Pulled 3s kubelet Successfully pulled image "gcr.io/google_containers/busybox" in 225.537279ms
Normal Created 3s kubelet Created container busybox
Normal Started 3s kubelet Started container busybox
The pod started just fine it seems. It's true that I can't do anything with the volume mounted to the pod ("Permission denied"), possibly because the relabeling did not happen, so even though the original bug looks to be fixed, the RWO feature is still somewhat useless in genral case.
Your PVC is missing the dataSourceRef in spec, which looks suspicious. Was the VolumeSnapshot ReadyToUse when you tried to create the volume from it and use it in the pod?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Description of problem: GCP CSI driver provisioned volume with rox, when checked from the worker, mounted parameter is rw,relatime,seclabel Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-06-03-221810 How reproducible: Always Steps to Reproduce: 1.oc describe pvc/pvc3 Name: pvc3 Namespace: openshift-cluster-csi-drivers StorageClass: standard-csi Status: Bound Volume: pvc-e75afa13-25d0-4bc1-9fe1-93260cc7c20d Labels: <none> Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/selected-node: chaoyang64-flgbm-worker-a-xn27m.c.openshift-qe.internal Finalizers: [kubernetes.io/pvc-protection] Capacity: 2Gi Access Modes: ROX VolumeMode: Filesystem Used By: pod3 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForFirstConsumer 19m (x3 over 20m) persistentvolume-controller waiting for first consumer to be created before binding Normal ExternalProvisioning 19m (x2 over 19m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "pd.csi.storage.gke.io" or manually created by system administrator Normal Provisioning 19m pd.csi.storage.gke.io_chaoyang64-flgbm-master-0.c.openshift-qe.internal_80871dfe-86ba-4881-9584-63c36274a831 External provisioner is provisioning volume for claim "openshift-cluster-csi-drivers/pvc3" Normal ProvisioningSucceeded 19m pd.csi.storage.gke.io_chaoyang64-flgbm-master-0.c.openshift-qe.internal_80871dfe-86ba-4881-9584-63c36274a831 Successfully provisioned volume pvc-e75afa13-25d0-4bc1-9fe1-93260cc7c20d 2.oc describe pv/pvc-e75afa13-25d0-4bc1-9fe1-93260cc7c20d Name: pvc-e75afa13-25d0-4bc1-9fe1-93260cc7c20d Labels: <none> Annotations: pv.kubernetes.io/provisioned-by: pd.csi.storage.gke.io Finalizers: [kubernetes.io/pv-protection external-attacher/pd-csi-storage-gke-io] StorageClass: standard-csi Status: Bound Claim: openshift-cluster-csi-drivers/pvc3 Reclaim Policy: Delete Access Modes: ROX VolumeMode: Filesystem Capacity: 2Gi Node Affinity: Required Terms: Term 0: topology.gke.io/zone in [us-central1-a] Message: Source: Type: CSI (a Container Storage Interface (CSI) volume source) Driver: pd.csi.storage.gke.io FSType: ext4 VolumeHandle: projects/openshift-qe/zones/us-central1-a/disks/pvc-e75afa13-25d0-4bc1-9fe1-93260cc7c20d ReadOnly: false VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1622779448600-8081-pd.csi.storage.gke.io Events: <none> 3.mount | grep pvc-e75afa13-25d0-4bc1-9fe1-93260cc7c20d /dev/sdf on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-e75afa13-25d0-4bc1-9fe1-93260cc7c20d/globalmount type ext4 (rw,relatime,seclabel) /dev/sdf on /var/lib/kubelet/pods/c50dc036-5634-4548-bebe-3e9f89598d26/volumes/kubernetes.io~csi/pvc-e75afa13-25d0-4bc1-9fe1-93260cc7c20d/mount type ext4 (rw,relatime,seclabel) Actual results: GCP CSI driver provisioned volume with rox Expected results: GCP CSI driver should not provision volume with rox Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: