Bug 1822353

Summary:

Under heavy object creation see error: "failed to set annotation on pod ... not found"

Product:

OpenShift Container Platform

Reporter:

agopi

Component:

Networking

Assignee:

Aniket Bhat <anbhat>

Networking sub component:

ovn-kubernetes

QA Contact:

zhaozhanqi <zzhao>

Status:

CLOSED DUPLICATE

Docs Contact:

Severity:

high

Priority:

unspecified

CC:

anbhat, dblack, jtaleric, nelluri, rsevilla

Version:

4.3.z

Keywords:

Performance

Target Milestone:

---

Target Release:

4.5.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

aos-scalability-43

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-04-20 22:01:10 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
rss_usage_ovnkube_apiserver	none

Description agopi 2020-04-08 19:49:46 UTC

Created attachment 1677357 [details]
rss_usage_ovnkube_apiserver

Created attachment 1677357 [details]
rss_usage_ovnkube_apiserver

Description of problem:

When trying to create a lot of objects across namespaces (1000) at 100 worker nodes, see the following error:

Warning  FailedCreatePodSandBox  25m  kubelet, ip-10-0-191-46.us-west-2.compute.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_deploymentconfig0-1-deploy_masterverticalovn1000proj100nodes700_b602b62f-4fc4-44fe-a606-201ab1300a57_0(a33d1aeee61f8ccf6ade592dc50d9465b517486fd3bf7448dbdbd04d4ae5bfe5): Multus: [masterverticalovn1000proj100nodes700/deploymentconfig0-1-deploy]: error adding container to network "ovn-kubernetes": delegateAdd: error
 invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[masterverticalovn1000proj100nodes700/deploymentconfig0-1-deploy] failed to get pod annotation: timed out waiting for the condition                                           


A few moments later we can see the ovnkube-master hitting readiness-probe issues:

 Warning  Unhealthy  23m (x10 over 5d11h)  kubelet, ip-10-0-128-5.us-west-2.compute.internal  Readiness probe failed: command timed out  


Over time we see ovnkube-master pod with ovnkube-master container eating up a lot of memory and then hitting oom, soon after we see the api-server do the same( as pointed in the graph ).




Version-Release number of selected component (if applicable): 4.3.9


Additional info:

Comment 1 Aniket Bhat 2020-04-20 22:01:10 UTC


*** This bug has been marked as a duplicate of bug 1820737 ***