Bug 1515907
Summary: | "Unable to mount volume" for volume containing large number of files | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Carsten Lichy-Bittendorf <clichybi> |
Component: | Storage | Assignee: | aos-storage-staff <aos-storage-staff> |
Storage sub component: | Storage | QA Contact: | Wei Duan <wduan> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | aos-bugs, aos-storage-staff, bmilne, chaoyang, clichybi, ekuric, erich, fshaikh, hekumar, jlee, jsafrane, Mathias.Merscher, pdwyer, rhowe, srangana, tidawson |
Version: | 3.6.1 | ||
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
Cause:
Whenever a pod is mounting a volume with FSGroup SecurityContext being set the GID ownership of the files needs to be updated recursively for all the files on the volume.
Consequence:
The ownership change takes time and for volumes with very large number of files it may mean the pod takes long time to start.
Workaround (if any):
No workaround known yet.
Result:
Pods using volumes with large number of files and FSGroup SecurityContext setting may take very long to start.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:11:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Carsten Lichy-Bittendorf
2017-11-21 15:14:34 UTC
The proposal PR: https://github.com/kubernetes/community/pull/1717 It doesn't seem to get much attention: one of the reasons is that while we may be able to work around the problem with recursive chown-ing there is a similar issue with SELinux relabelling which is being done by container runtime (docker) and we have no control of. That means the proposal is still not complete remedy to the problem. A tweakble timeout with exponential backoff or something similar is the only thing that might mitigate the issue. I ran some tests te be sure: the pod events show the timeout messages while the volume files are being chowned but once this is done the volume mount succeeds and the pod starts... I understand this is really inconvenient, however it's good to point out the proposal from the comment #15 would also mean the user would have to wait for some other container (init) to do the work (albeit asynchronously). I can try to add some more events "Still changing file ownership, please wait" which may make the user at least informed what is going on. But a generic solution that would not traverse the fs and still make sure the files have proper ownership and labels without having to wait... I simply have no idea how would I do that. There might be no generic solution. So we at least should go for: - enhance the logging to give good pointers on where the time gets consumed - enhance the documentation to explain to our customers that this can happen and how to tune to get around. m2c Kubernetes PR: https://github.com/kubernetes/kubernetes/pull/61550 *** Bug 1761938 has been marked as a duplicate of this bug. *** *** Bug 1725275 has been marked as a duplicate of this bug. *** We're tracking this issue in our JIRA, https://jira.coreos.com/browse/STOR-267. It requires an API change and must go through alpha/beta/ga process upstream. For the time being, we do not have a really useful workaround, the best is not to use fsGroup in pods that use volumes with large number of files. Good news: we have Kubernetes enhancement merged: https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20200120-skip-permission-change.md Bad news: it will take some time to implement, as it probably needs go through alpha/beta stage. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |