1822353 – Under heavy object creation see error: "failed to set annotation on pod ... not found"

Bug 1822353 - Under heavy object creation see error: "failed to set annotation on pod ... not found"

Summary: Under heavy object creation see error: "failed to set annotation on pod ... n...

Keywords:
Status:	CLOSED DUPLICATE of bug 1820737
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Aniket Bhat
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:	aos-scalability-43
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-08 19:49 UTC by agopi
Modified:	2020-08-04 14:03 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-20 22:01:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
rss_usage_ovnkube_apiserver (70.72 KB, image/png) 2020-04-08 19:49 UTC, agopi	no flags	Details
View All

Description agopi 2020-04-08 19:49:46 UTC

Created attachment 1677357 [details]
rss_usage_ovnkube_apiserver

Created attachment 1677357 [details]
rss_usage_ovnkube_apiserver

Description of problem:

When trying to create a lot of objects across namespaces (1000) at 100 worker nodes, see the following error:

Warning  FailedCreatePodSandBox  25m  kubelet, ip-10-0-191-46.us-west-2.compute.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_deploymentconfig0-1-deploy_masterverticalovn1000proj100nodes700_b602b62f-4fc4-44fe-a606-201ab1300a57_0(a33d1aeee61f8ccf6ade592dc50d9465b517486fd3bf7448dbdbd04d4ae5bfe5): Multus: [masterverticalovn1000proj100nodes700/deploymentconfig0-1-deploy]: error adding container to network "ovn-kubernetes": delegateAdd: error
 invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[masterverticalovn1000proj100nodes700/deploymentconfig0-1-deploy] failed to get pod annotation: timed out waiting for the condition                                           


A few moments later we can see the ovnkube-master hitting readiness-probe issues:

 Warning  Unhealthy  23m (x10 over 5d11h)  kubelet, ip-10-0-128-5.us-west-2.compute.internal  Readiness probe failed: command timed out  


Over time we see ovnkube-master pod with ovnkube-master container eating up a lot of memory and then hitting oom, soon after we see the api-server do the same( as pointed in the graph ).




Version-Release number of selected component (if applicable): 4.3.9


Additional info:

Comment 1 Aniket Bhat 2020-04-20 22:01:10 UTC


*** This bug has been marked as a duplicate of bug 1820737 ***

Note You need to log in before you can comment on or make changes to this bug.