Bug 1839933
Summary: | Pods with PVCs attached takes long time to start | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Humble Chirammal <hchiramm> | |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> | |
Storage sub component: | Kubernetes | QA Contact: | Wei Duan <wduan> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | aos-bugs, bniver, chaoyang, ebenahar, ekuric, gmeno, jsafrane, kramdoss, madam, mrajanna, muagarwa, ocs-bugs, ratamir, rperiyas, sostapov, ykaul | |
Version: | 4.4 | Keywords: | Performance, Regression | |
Target Milestone: | --- | |||
Target Release: | 4.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | 1836198 | |||
: | 1854311 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-27 16:01:02 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1836198, 1854311, 1870193 |
Comment 1
Humble Chirammal
2020-05-26 04:23:15 UTC
Waiting for upstream PR: https://github.com/kubernetes/kubernetes/pull/91307 (In reply to Jan Safranek from comment #4) > And of course, once the PR is available, we're going to backport it to all > the way to 4.4 Thanks Jan for the update. The upstream PR is good in shape apart from missing the tests. Hopefully it will get there soon. Apart from that, thinking about the solution or avoiding the possibilities of getting into the situation: This is what I could come up with. That said, the Ceph CSI driver does not make use of CONTROLLER PUBLISH and UNPUBLISH calls or these capabilities are not exposed from the driver. However with the history of the development of CSI and Ceph CSI plugin we ***were*** making use of external-attacher sidecar till now. But, the upstream has a feature or solution to `skip attach` being a part of the CSIDriver object. Considering this is GA ( actually CSIDriver functionality) with "1.18", my proposal is (https://github.com/ceph/ceph-csi/issues/1106) to get rid of 'external-attacher" completely in Ceph CSI driver implementation by making use of this field. However this needs extensive testing in the driver..etc which we will get on with release v3.0.0 of upstream. Test result - copied from bz#183698 --snip-- I tested with https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.nightly/release/4.6.0-0.nightly-2020-06-30-020342 and OCS v4.4 - quay.io/rhceph-dev/ocs-olm-operator:latest-stable-4.4 and I see improvement if compared to state from comment #1 of this BZ. With OCS v4.6 pods starts at at same speed as it was with OCP v4.3. From beginning OCS was not problematic side as OCP v4.4 / OCP v4.5 + OCS 4.4 / OCS v4.5 / OCS v4.3 were problematic. Now, with OCP v4.6 + OCS v4.4 result is satisfying, pods are starting fine. Start times: first batch of 1000 pods with PVCs: 10m 17sec second batch of 1000 pods with PVCs: 10m 27sec third batch of 1000 pods with PVCs: 10m 25s In this test 500 pods with PVC per OCP node were scheduled. --/snip-- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |