Bug 2053156

Summary: Avoid worldwide permission mode setting at time of nodestage of CephFS share
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Humble Chirammal <hchiramm>
Component: csi-driverAssignee: Humble Chirammal <hchiramm>
Status: CLOSED ERRATA QA Contact: Rachael <rgeorge>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.10CC: ebenahar, kramdoss, madam, mmuench, muagarwa, ocs-bugs, odf-bz-bot, rgeorge
Target Milestone: ---   
Target Release: ODF 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.10.0-160 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-13 18:53:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Humble Chirammal 2022-02-10 15:53:43 UTC
Description of problem (please be detailed as possible and provide log
snippests):

At present Ceph CSI does provide or set 0777 permission at time of staging  a Cephfs share which is not the correct thing to do. CSI should leave the validation and adjustment to CO/kubelet based on the FSGroup Change policy in place. 


Version of all relevant components (if applicable):

ODF 4.10

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 2 Humble Chirammal 2022-02-11 08:15:23 UTC
Karthick, Racheal, can you help to get QE ack on this?

Comment 3 krishnaram Karthick 2022-02-11 09:26:45 UTC
Humble - could you please update us on what additional tests would be needed to cover the validation?

Comment 4 Humble Chirammal 2022-02-11 11:33:49 UTC
As far as verification goes, we are good as long as all the existing operations and CI tests pass I believe.

Comment 7 Humble Chirammal 2022-02-11 11:44:08 UTC
Rook PR to change the default Policy in light of above BUG fix in Ceph CSI: https://github.com/rook/rook/pull/9729

Comment 11 Humble Chirammal 2022-03-07 07:49:13 UTC
It looks like even in release 4.9, the default was not changed to NONE and been kept as "onrootmismatch", if thats the case, no extra verification required on this from this bugzilla report pov.

Comment 12 Humble Chirammal 2022-03-07 07:50:36 UTC
(In reply to Humble Chirammal from comment #11)
> It looks like even in release 4.9, the default was not changed to NONE and
> been kept as "onrootmismatch", if thats the case, no extra verification
> required on this from this bugzilla report pov.

Discard above, it was meant for https://bugzilla.redhat.com/show_bug.cgi?id=2059248.

Comment 13 Humble Chirammal 2022-03-10 11:53:12 UTC
Looking some more further into this ( on Racheal++'s test setup)  it seems that, eventhough the nodestage chmod'g of 777 has been avoided, we are still connecting to go-ceph with 777 mode while subvolumes are created, while that in place, the nodestage changes becomes NOOP and that need to be corrected too for the fix completion. 

https://github.com/ceph/ceph-csi/blob/devel/internal/cephfs/core/volume.go#L234

Comment 14 Humble Chirammal 2022-03-10 14:54:04 UTC
(In reply to Humble Chirammal from comment #13)
> Looking some more further into this ( on Racheal++'s test setup)  it seems
> that, eventhough the nodestage chmod'g of 777 has been avoided, we are still
> connecting to go-ceph with 777 mode while subvolumes are created, while that
> in place, the nodestage changes becomes NOOP and that need to be corrected
> too for the fix completion. 
> 
> https://github.com/ceph/ceph-csi/blob/devel/internal/cephfs/core/volume.
> go#L234

This bug report is about the ceph csi driver interception at time of node staging for changing permission and it has been fixed already. 
The scenario mentioned in my above comment is bit different and may be not a good idea to mix it with this bug.
with this thought process, I am reverting or flipping the status back to ON_QA.

Comment 16 Humble Chirammal 2022-03-15 07:49:56 UTC
Thanks a lot rachael for following the tests  based on the discussions across  different ODF clusters!

Comment 18 errata-xmlrpc 2022-04-13 18:53:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372