2053156 – Avoid worldwide permission mode setting at time of nodestage of CephFS share

Bug 2053156 - Avoid worldwide permission mode setting at time of nodestage of CephFS share

Summary: Avoid worldwide permission mode setting at time of nodestage of CephFS share

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	csi-driver
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	ODF 4.10.0
Assignee:	Humble Chirammal
QA Contact:	Rachael
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-10 15:53 UTC by Humble Chirammal
Modified:	2023-08-09 16:37 UTC (History)
CC List:	8 users (show)
Fixed In Version:	4.10.0-160
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-04-13 18:53:05 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-csi pull 2847	None	Merged	cephfs: dont set explicit permissions on the volume	2022-02-10 15:55:23 UTC
Github	red-hat-storage ceph-csi pull 79	None	open	Bug 2053156: cephfs: dont set explicit permissions on the volume	2022-02-14 06:23:50 UTC
Red Hat Product Errata	RHSA-2022:1372	None	None	None	2022-04-13 18:53:18 UTC

Description Humble Chirammal 2022-02-10 15:53:43 UTC

Description of problem (please be detailed as possible and provide log
snippests):

At present Ceph CSI does provide or set 0777 permission at time of staging  a Cephfs share which is not the correct thing to do. CSI should leave the validation and adjustment to CO/kubelet based on the FSGroup Change policy in place. 


Version of all relevant components (if applicable):

ODF 4.10

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 2 Humble Chirammal 2022-02-11 08:15:23 UTC

Karthick, Racheal, can you help to get QE ack on this?

Comment 3 krishnaram Karthick 2022-02-11 09:26:45 UTC

Humble - could you please update us on what additional tests would be needed to cover the validation?

Comment 4 Humble Chirammal 2022-02-11 11:33:49 UTC

As far as verification goes, we are good as long as all the existing operations and CI tests pass I believe.

Comment 7 Humble Chirammal 2022-02-11 11:44:08 UTC

Rook PR to change the default Policy in light of above BUG fix in Ceph CSI: https://github.com/rook/rook/pull/9729

Comment 11 Humble Chirammal 2022-03-07 07:49:13 UTC

It looks like even in release 4.9, the default was not changed to NONE and been kept as "onrootmismatch", if thats the case, no extra verification required on this from this bugzilla report pov.

Comment 12 Humble Chirammal 2022-03-07 07:50:36 UTC

(In reply to Humble Chirammal from comment #11)
> It looks like even in release 4.9, the default was not changed to NONE and
> been kept as "onrootmismatch", if thats the case, no extra verification
> required on this from this bugzilla report pov.

Discard above, it was meant for https://bugzilla.redhat.com/show_bug.cgi?id=2059248.

Comment 13 Humble Chirammal 2022-03-10 11:53:12 UTC

Looking some more further into this ( on Racheal++'s test setup)  it seems that, eventhough the nodestage chmod'g of 777 has been avoided, we are still connecting to go-ceph with 777 mode while subvolumes are created, while that in place, the nodestage changes becomes NOOP and that need to be corrected too for the fix completion. 

https://github.com/ceph/ceph-csi/blob/devel/internal/cephfs/core/volume.go#L234

Comment 14 Humble Chirammal 2022-03-10 14:54:04 UTC

(In reply to Humble Chirammal from comment #13)
> Looking some more further into this ( on Racheal++'s test setup)  it seems
> that, eventhough the nodestage chmod'g of 777 has been avoided, we are still
> connecting to go-ceph with 777 mode while subvolumes are created, while that
> in place, the nodestage changes becomes NOOP and that need to be corrected
> too for the fix completion. 
> 
> https://github.com/ceph/ceph-csi/blob/devel/internal/cephfs/core/volume.
> go#L234

This bug report is about the ceph csi driver interception at time of node staging for changing permission and it has been fixed already. 
The scenario mentioned in my above comment is bit different and may be not a good idea to mix it with this bug.
with this thought process, I am reverting or flipping the status back to ON_QA.

Comment 16 Humble Chirammal 2022-03-15 07:49:56 UTC

Thanks a lot rachael for following the tests  based on the discussions across  different ODF clusters!

Comment 18 errata-xmlrpc 2022-04-13 18:53:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372

Note You need to log in before you can comment on or make changes to this bug.