Bug 1503906

Summary: fsGroup (chown+chmod) causes time out if storage contains lots of files
Product: OpenShift Container Platform Reporter: Kenjiro Nakayama <knakayam>
Component: RFEAssignee: Eric Paris <eparis>
Status: CLOSED WONTFIX QA Contact: Xiaoli Tian <xtian>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.1CC: aos-bugs, aos-storage-staff, bchilds, clichybi, eparis, erich, fche, jokerman, mmccomas, mmclane, rkant
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-12 12:49:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kenjiro Nakayama 2017-10-19 03:12:46 UTC
Description of problem:

- Related to bz#1459106, when containers has a lot of files, fsGroup(chmod+chown) causes time out.
- The podAttachAndMountTimeout has fixed value (2min)[1], but it is not configurable.

Version-Release number of selected component (if applicable):
- OCP 3.x (we experienced 3.3)

Steps to Reproduce:
1. Please refer to the description of bz#1459106. However, it might need small file for the fsGroup issue.

Actual results:
- Caused timeout after 2m.

  FirstSeen     LastSeen        Count   From                                                            SubobjectPath   Type            Reason          Message
  ---------     --------        -----   ----                                                            -------------   --------        ------          -------
  2m            2m              1       {default-scheduler }                                                            Normal          Scheduled       Successfully assigned <NODE_NAME>
  14s           14s             1       {kubelet <NODE_NAME>}                     Warning         FailedMount     Unable to mount volumes for pod "jenkins-xxx(xxx)": timeout expired waiting for volumes to attach/mount for pod "jenkins-xxx"/"xxx-xx". list of unattached/unmounted volumes=[jenkins-xxx]

  14s           14s             1       {kubelet <NODE_NAME>}                     Warning         FailedSync      Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "jenkins-xxx"/"xxx-xx". list of unattached/unmounted volumes=[jenkins-data]

-  In the OpenShift Node log, there are tons of chown/chmod logs.

Expected results:
- We wanted to expand podAttachAndMountTimeout.

Additional info:
- Workaround is to remove fsGroup from DC[2].


[1] https://github.com/openshift/origin/blob/release-1.3/vendor/k8s.io/kubernetes/pkg/kubelet/volumemanager/volume_manager.go#L70
[2] https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/volume_linux.go#L35-L38

Comment 2 Kenjiro Nakayama 2017-12-28 00:41:46 UTC
I don't think that it is a RFE ticket. Without configurable timeout value, if the users have a lot of files, fsGroup(chown+chgroup) causes timeout and the pod cannot start.

Comment 7 Kirsten Newcomer 2019-06-12 11:59:07 UTC
With the introduction of OpenShift 4, Red Hat has delivered or roadmapped a substantial number of features based on feedback by our customers.  Many of the enhancements encompass specific RFEs which have been requested, or deliver a comparable solution to a customer problem, rendering an RFE redundant.

This bz (RFE) has been identified as a feature request not yet planned or scheduled for an OpenShift release and is being closed. 

If this feature is still an active request that needs to be tracked, Red Hat Support can assist in filing a request in the new JIRA RFE system, as well as provide you with updates as the RFE progress within our planning processes. Please open a new support case: https://access.redhat.com/support/cases/#/case/new 

Opening a New Support Case: https://access.redhat.com/support/cases/#/case/new 

As the new Jira RFE system is not yet public, Red Hat Support can help answer your questions about your RFEs via the same support case system.