Description of problem: This issue was discovered as part of this JIRA discovered by the serverless team: https://issues.redhat.com/browse/SRVKS-379 Details of the testing are there, but a bare pod takes almost 7 seconds to be ready, and a pod in a deployment takes more like 8 or 9 seconds. Scale from zero on GKE happens in about 2 seconds., and this is a big problem for OpenShift serverless. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
From the log snippet in the jira: Dec 17 03:28:30 ip-10-0-174-45 hyperkube[2277]: I1217 03:28:30.132130 2277 manager.go:1011] Added container: "/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4b71d398_207d_11ea_8028_0a1263433394.slice/crio-a40b6ac715335fcd3227fb2e5c84f14ae52cd319d703f8718873cecfba2f81a8.scope" (aliases: [k8s_POD_test-selector-74fc74c8d8-xl4b7_bbrowning-sandbox_4b71d398-207d-11ea-8028-0a1263433394_0 a40b6ac715335fcd3227fb2e5c84f14ae52cd319d703f8718873cecfba2f81a8], namespace: "crio") Dec 17 03:28:37 ip-10-0-174-45 crio[1889]: 2019-12-17T03:28:37Z [verbose] Add: bbrowning-sandbox:test-selector-74fc74c8d8-xl4b7:openshift-sdn:eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/proc/3285126/ns/net"}],"ips":[{"version":"4","interface":0,"address":"10.128.2.78/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.128.2.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}} this looks like either an issue with kubelet or the runtime. Moving to node team for investigation.
Which release are you testing with? RHCOS version?
The SRVKS-379 was discovered against OCP 4.2.10, but I do not know the exact version of RHCOS.
Created attachment 1661789 [details] cri-o logs
I've identified the root cause of the introduced delay, and have create a number of PR's to address it. The core issue that the use of the "readinessindicatorfile" option introduces a delay every time when this option is not used. I believe this was used in previous OpenShift versions and removed in a recent version. I'll continue to assess if other previous versions will need a fix in the z-stream. Upstream PR @ https://github.com/intel/multus-cni/pull/439 Downstream master PR @ https://github.com/openshift/multus-cni/pull/43 Downstream 4.4 PR @ https://github.com/openshift/multus-cni/pull/44 Downstream 4.3 PR @ https://github.com/openshift/multus-cni/pull/45
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475