Description of problem: Pod sandbox creation gets stuck for approximately 17mins and then fails. Subsequent sandbox creation request from kubelet for the same pod succeeds. During this period of time other pod sandboxes get created without any issue. This issue has lead to the failure of some of the CI jobs in particularly the following: 1. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn/1518876947986255872 2. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn/1521829895582257152 3. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn/1533907006589505536 The cri-o logs of the particular node on which the pod is scheduled does not provide much details as to what is causing the 17mins delay. In all the above cases host network is set to true for the particular pod. So, the delay is not caused by CNI plugin not being ready. Version-Release number of selected component (if applicable): How reproducible: Not able to reproduce the issue. Found them in the CI job failures. Steps to Reproduce: 1. 2. 3. Actual results: Pod sandbox creation gets stuck for 17mins and then fails with either of the following errors: 1. Kubelet may be retrying requests that are timing out in CRI-O due to system load: context deadline exceeded: error reserving pod name 2. Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded Expected results: Pod sandbox creation should not get stuck. Additional info: Slack thread regarding the issue: https://coreos.slack.com/archives/CK1AE4ZCK/p1655287375267959
Thanks, I'll close this bug now since it seems to be either already fixed or not reproducible. Let's reconsider the case once we find a similar issue.
Would adding some log messages by default, rather than through CRI-O debug logs, be of help in the future if such issues arise again?
Luckily, we have added such log messages recently in 4.11, and we intend on backporting them. They're available by default at the info level
Sounds good then.