1830018 – ocp4.4: Pod Error: context deadline exceeded

Bug 1830018 - ocp4.4: Pod Error: context deadline exceeded

Summary: ocp4.4: Pod Error: context deadline exceeded

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Colin Walters
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-30 17:48 UTC by Hongkai Liu
Modified:	2020-06-17 17:36 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-17 17:36:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1829651	0	urgent	CLOSED	[4.4] MachineConfig CRD does not define all fields which users might have set in 4.3	2021-04-05 17:24:42 UTC

Description Hongkai Liu 2020-04-30 17:48:04 UTC

Description of problem:

oc get machine -n openshift-machine-api build01-9hdwj-worker-us-east-1b-m5d4x-w4fp2 -o wide
NAME                                          PHASE     TYPE          REGION      ZONE         AGE   NODE                           PROVIDERID                              STATE
build01-9hdwj-worker-us-east-1b-m5d4x-w4fp2   Running   m5d.4xlarge   us-east-1   us-east-1b   15d   ip-10-0-146-117.ec2.internal   aws:///us-east-1b/i-0890eb78de6644a83   running

oc get node ip-10-0-146-117.ec2.internal -o wide
NAME                           STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
ip-10-0-146-117.ec2.internal   Ready    worker   15d   v1.17.1   10.0.146.117   <none>        Red Hat Enterprise Linux CoreOS 44.81.202004260825-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8

This is m5d.4xlarge worker node from CI build cluster.

oc get clusterversions.config.openshift.io
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0     True        False         9h      Cluster version is 4.4.0

We have several pods on this node with this error (Error: context deadline exceeded) in the pod description.
Sometimes, retries worked out: the pod is eventually up and running. 

I would like to make sure it is expected hehavior from kubelet and crio, instead of bugs.

I will attach more files later.

Comment 1 Peter Hunt 2020-04-30 17:51:48 UTC

AFAICT this is expected. This is kubelet and crio saying "we are taking a long time to create pods/containers!". If the pods eventually reconcile and become ready, then this is okay. If they don't, the node may be overcommitted.

Comment 7 Micah Abbott 2020-05-14 15:36:11 UTC

I think this was fixed with https://github.com/openshift/release/pull/8715?

Comment 10 Colin Walters 2020-06-17 17:36:12 UTC

I believe this is obsolete, CI isn't using this configuration anymore.  Please reopen if that's not correct.

Note You need to log in before you can comment on or make changes to this bug.