Bug 2099214
| Summary: | Permission denied error while writing IO to ceph-rbd - fs - RWO based PVC | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Anant Malhotra <anamalho> |
| Component: | csi-driver | Assignee: | Humble Chirammal <hchiramm> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | krishnaram Karthick <kramdoss> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.10 | CC: | hchiramm, madam, muagarwa, ocs-bugs, odf-bz-bot, tdesala, ypadia |
| Target Milestone: | --- | Flags: | ndevos:
needinfo?
(tdesala) |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-10-04 02:22:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Anant Malhotra
2022-06-20 10:29:41 UTC
Looks like the extra edition you are making in pod spec via below (ie trying to use non privileged/completely restricted pod execution.) looks to be causing this : https://github.com/red-hat-storage/ocs-ci/pull/5943/files#diff-c3679703e785f8a8ae14abbe4b97f354fc00aab50332d5592a9750e929d51d55R8 https://github.com/red-hat-storage/ocs-ci/pull/5943/files#diff-c3679703e785f8a8ae14abbe4b97f354fc00aab50332d5592a9750e929d51d55R24 If you are running in the OCP setup for restricted or non privileged pod, the SCC ..etc has to be configured correctly. Can you try the longevity test without those changes in the pod yaml? also, as a second thing, please add fsGroup* setting in the POD yaml and give a try: example snip can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1988284#c2 The important part or the required part is `fsgroup` and `fsGroupChangePolicy` addition which match the `runAsUser`. You can avoid `selinuxOptions` though. [...] securityContext: fsGroup: 1000510000 fsGroupChangePolicy: OnRootMismatch runAsUser: 1000510000 ... Looks like a ci issue, moving to 4.12 while we work on RCA After updating the yaml with fsgroup and fsgroupchangepolicy params, IO is running completely without any error on the PVC of storage class - cephrbd, access mode - RWO and volume mode - FS. Also IO is running fine on all the other PVCs as well.
Updated yaml ->
---
apiVersion: v1
kind: Pod
metadata:
name: perf-pod
namespace: default
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
fsGroupChangePolicy: OnRootMismatch
containers:
- name: performance
image: quay.io/ocsci/perf:latest
imagePullPolicy: IfNotPresent
command: ['/bin/sh']
stdin: true
tty: true
volumeMounts:
- name: mypvc
mountPath: /mnt
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
fsGroupChangePolicy: OnRootMismatch
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: pvc
readOnly: false
@Humble,
Even without the below fsgroup entries in the securityContext, we are able to write IO on all other supported PVC types successfully without any issues/errors. The permission denied error is observed only with this specific volume: ceph-rbd-RWO. What could be the reason why we are seeing this issue only on ceph-rbd-RWO?
[...]
securityContext:
fsGroup: 1000510000
fsGroupChangePolicy: OnRootMismatch
runAsUser: 1000510000
...
(In reply to Prasad Desala from comment #5) > @Humble, > > Even without the below fsgroup entries in the securityContext, we are able > to write IO on all other supported PVC types successfully without any > issues/errors. The permission denied error is observed only with this > specific volume: ceph-rbd-RWO. What could be the reason why we are seeing > this issue only on ceph-rbd-RWO? Because this RBD volume has a filesystem on top, it is required to check for issues with the filesystem as well. In case the volume (or RBD connection to Ceph) had problems, it could cause the filesystem to become read-only. You would need to inspect the kernel logs from the time the issue occurred. Moving the Pod to an other node may show details about a corrupt filesystem too (mkfs execution in the csi-rbdplugin logs on the new node). Logs of the node where the problem happened do not seem to be available, or at least I am not able to find the linked in this BZ. Steps to reproduce this (get a volume into this error state) in an other environment or with an other volume would help. Please reopen when we have enough data to move ahead. |