Bug 1822353 - Under heavy object creation see error: "failed to set annotation on pod ... not found"
Summary: Under heavy object creation see error: "failed to set annotation on pod ... n...
Keywords:
Status: CLOSED DUPLICATE of bug 1820737
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Aniket Bhat
QA Contact: zhaozhanqi
URL:
Whiteboard: aos-scalability-43
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-08 19:49 UTC by agopi
Modified: 2020-08-04 14:03 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-20 22:01:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rss_usage_ovnkube_apiserver (70.72 KB, image/png)
2020-04-08 19:49 UTC, agopi
no flags Details

Description agopi 2020-04-08 19:49:46 UTC
Created attachment 1677357 [details]
rss_usage_ovnkube_apiserver

Created attachment 1677357 [details]
rss_usage_ovnkube_apiserver

Description of problem:

When trying to create a lot of objects across namespaces (1000) at 100 worker nodes, see the following error:

Warning  FailedCreatePodSandBox  25m  kubelet, ip-10-0-191-46.us-west-2.compute.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_deploymentconfig0-1-deploy_masterverticalovn1000proj100nodes700_b602b62f-4fc4-44fe-a606-201ab1300a57_0(a33d1aeee61f8ccf6ade592dc50d9465b517486fd3bf7448dbdbd04d4ae5bfe5): Multus: [masterverticalovn1000proj100nodes700/deploymentconfig0-1-deploy]: error adding container to network "ovn-kubernetes": delegateAdd: error
 invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[masterverticalovn1000proj100nodes700/deploymentconfig0-1-deploy] failed to get pod annotation: timed out waiting for the condition                                           


A few moments later we can see the ovnkube-master hitting readiness-probe issues:

 Warning  Unhealthy  23m (x10 over 5d11h)  kubelet, ip-10-0-128-5.us-west-2.compute.internal  Readiness probe failed: command timed out  


Over time we see ovnkube-master pod with ovnkube-master container eating up a lot of memory and then hitting oom, soon after we see the api-server do the same( as pointed in the graph ).




Version-Release number of selected component (if applicable): 4.3.9


Additional info:

Comment 1 Aniket Bhat 2020-04-20 22:01:10 UTC

*** This bug has been marked as a duplicate of bug 1820737 ***


Note You need to log in before you can comment on or make changes to this bug.