Bug 1597867 - [WIP] PVCs with large numbers of files take significant time to attach and/or cause pod init to timeout
Summary: [WIP] PVCs with large numbers of files take significant time to attach and/or...
Keywords:
Status: CLOSED DUPLICATE of bug 1459106
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 3.x
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Mrunal Patel
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-03 18:02 UTC by Mike McLane
Modified: 2018-07-03 19:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-03 19:06:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mike McLane 2018-07-03 18:02:17 UTC
[Bug details work in progress] -- (collecting data around timings and generating a reproducer)

Description of problem:
In cases where a network-attached PVC has tens of thousands of files, the time required to attach the PVC to a pod increases significantly with the more files present. This can lead to POD initialization timeouts.

In the use case of OpenShift.io, it is a usual case for the IDE (che) workspaces to make use of tens of thousands of files. It has been observed that volumes with <10k files, the che POD is able to start successfully with no additional pod start parameters. In the case of >=30k files, the che POD is unable to start as the mount time introduces an init timeout. 

In mounting gluster-subvol, we were able to work with the storage team to observe the operations being performed on a volume during attachment. There appears to be a recursive ownership change that happens on every attach/mount event.  

Version-Release number of selected component (if applicable):

$ oc version
oc v3.9.14
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://api.starter-us-east-2.openshift.com:443
openshift v3.9.14
kubernetes v1.9.1+a0ce1bc657


How reproducible:
Every time.

Steps to Reproduce:
1. Start a pod (without a DC) with an attached pvc containing 35,000 files.

Actual results:
Pod initializaiton will timeout

Expected results:
Pod initialization will succeed

Additional info:

It looks like the section of container code that handles permission application to PVCs is here [1]. In cases where we use a deployment config to start a pod, it looks like the replication controller/deploy pods allow recovery during longer pod initialization times, leading to more successful spin-ups.

[1] https://github.com/kubernetes/kubernetes/blob/692b34825f4e505b403c063270d1e007ee139ea8/pkg/volume/volume_linux.go#L35-L91

Comment 1 Mrunal Patel 2018-07-03 19:06:59 UTC

*** This bug has been marked as a duplicate of bug 1459106 ***


Note You need to log in before you can comment on or make changes to this bug.