[Bug details work in progress] -- (collecting data around timings and generating a reproducer) Description of problem: In cases where a network-attached PVC has tens of thousands of files, the time required to attach the PVC to a pod increases significantly with the more files present. This can lead to POD initialization timeouts. In the use case of OpenShift.io, it is a usual case for the IDE (che) workspaces to make use of tens of thousands of files. It has been observed that volumes with <10k files, the che POD is able to start successfully with no additional pod start parameters. In the case of >=30k files, the che POD is unable to start as the mount time introduces an init timeout. In mounting gluster-subvol, we were able to work with the storage team to observe the operations being performed on a volume during attachment. There appears to be a recursive ownership change that happens on every attach/mount event. Version-Release number of selected component (if applicable): $ oc version oc v3.9.14 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://api.starter-us-east-2.openshift.com:443 openshift v3.9.14 kubernetes v1.9.1+a0ce1bc657 How reproducible: Every time. Steps to Reproduce: 1. Start a pod (without a DC) with an attached pvc containing 35,000 files. Actual results: Pod initializaiton will timeout Expected results: Pod initialization will succeed Additional info: It looks like the section of container code that handles permission application to PVCs is here [1]. In cases where we use a deployment config to start a pod, it looks like the replication controller/deploy pods allow recovery during longer pod initialization times, leading to more successful spin-ups. [1] https://github.com/kubernetes/kubernetes/blob/692b34825f4e505b403c063270d1e007ee139ea8/pkg/volume/volume_linux.go#L35-L91
*** This bug has been marked as a duplicate of bug 1459106 ***