Bug 1794142 - Pod startup time in OpenShift 4 has increased
Summary: Pod startup time in OpenShift 4 has increased
Status: CLOSED DUPLICATE of bug 1805774
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.4.0
Assignee: Douglas Smith
QA Contact: zhaozhanqi
Depends On:
TreeView+ depends on / blocked
Reported: 2020-01-22 18:42 UTC by Alan Field
Modified: 2021-04-05 17:47 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-03-02 12:21:44 UTC
Target Upstream Version:

Attachments (Terms of Use)
cri-o logs (554.56 KB, text/plain)
2020-02-07 20:56 UTC, Urvashi Mohnani
no flags Details

System ID Private Priority Status Summary Last Updated
Github openshift multus-cni pull 43 0 None closed Bug 1794142: Fixes unnecessary wait when readinessindicatorfile is not present 2020-03-09 19:18:06 UTC

Description Alan Field 2020-01-22 18:42:03 UTC
Description of problem:

This issue was discovered as part of this JIRA discovered by the serverless team:


Details of the testing are there, but a bare pod takes almost 7 seconds to be ready, and a pod in a deployment takes more like 8 or 9 seconds. Scale from zero on GKE happens in about 2 seconds., and this is a big problem for OpenShift serverless.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 1 Maciej Szulik 2020-01-27 17:29:32 UTC
From the log snippet in the jira:

Dec 17 03:28:30 ip-10-0-174-45 hyperkube[2277]: I1217 03:28:30.132130    2277 manager.go:1011] Added container: "/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4b71d398_207d_11ea_8028_0a1263433394.slice/crio-a40b6ac715335fcd3227fb2e5c84f14ae52cd319d703f8718873cecfba2f81a8.scope" (aliases: [k8s_POD_test-selector-74fc74c8d8-xl4b7_bbrowning-sandbox_4b71d398-207d-11ea-8028-0a1263433394_0 a40b6ac715335fcd3227fb2e5c84f14ae52cd319d703f8718873cecfba2f81a8], namespace: "crio")
Dec 17 03:28:37 ip-10-0-174-45 crio[1889]: 2019-12-17T03:28:37Z [verbose] Add: bbrowning-sandbox:test-selector-74fc74c8d8-xl4b7:openshift-sdn:eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/proc/3285126/ns/net"}],"ips":[{"version":"4","interface":0,"address":""}],"routes":[{"dst":"","gw":""},{"dst":""},{"dst":""}],"dns":{}}

this looks like either an issue with kubelet or the runtime. Moving to node team for investigation.

Comment 2 Ryan Phillips 2020-01-27 17:32:04 UTC
Which release are you testing with? RHCOS version?

Comment 3 Alan Field 2020-01-27 18:49:52 UTC
The SRVKS-379 was discovered against OCP 4.2.10, but I do not know the exact version of RHCOS.

Comment 5 Urvashi Mohnani 2020-02-07 20:56:12 UTC
Created attachment 1661789 [details]
cri-o logs

Comment 6 Douglas Smith 2020-02-10 19:00:32 UTC
I've identified the root cause of the introduced delay, and have create a number of PR's to address it.
The core issue that the use of the "readinessindicatorfile" option introduces a delay every time when this option is not used. I believe this was used in previous OpenShift versions and removed in a recent version. I'll continue to assess if other previous versions will need a fix in the z-stream.

Upstream PR @ https://github.com/intel/multus-cni/pull/439

Downstream master PR @ https://github.com/openshift/multus-cni/pull/43
Downstream 4.4 PR @ https://github.com/openshift/multus-cni/pull/44
Downstream 4.3 PR @ https://github.com/openshift/multus-cni/pull/45

Comment 9 W. Trevor King 2021-04-05 17:47:56 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475

Note You need to log in before you can comment on or make changes to this bug.