Bug 1997072

Summary: [scale] [4.9z] failed to get pod annotation: timed out waiting for annotations
Product: OpenShift Container Platform Reporter: Dustin Black <dblack>
Component: NetworkingAssignee: Tim Rozet <trozet>
Networking sub component: ovn-kubernetes QA Contact: Mike Fiedler <mifiedle>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: aconstan, astoycos, bbennett, dblack, dcbw, jlema, jtaleric, juzhao, kkulkarn, mifiedle, rbrattai, rravaiol, smalleni, trozet, zzhao
Version: 4.7Keywords: FastFix
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: All   
OS: All   
Whiteboard: perfscale-ovn
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1959352
: 2014332 (view as bug list) Environment:
Last Closed: 2021-11-16 06:24:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1959352, 2076201    
Bug Blocks: 2014332    

Comment 1 Tim Rozet 2021-08-25 14:34:31 UTC
Hey Dustin, I see you cloned this from 1959352, is this meant to target 4.8z?

Comment 2 Dustin Black 2021-09-03 14:37:13 UTC
(In reply to Tim Rozet from comment #1)
> Hey Dustin, I see you cloned this from 1959352, is this meant to target 4.8z?

BZ 1959352 was reported on 4.8, but we also experienced this problem on 4.7 and we need to ensure there is a backport tracker for that stream.

Comment 3 Tim Rozet 2021-09-10 16:27:30 UTC
I updated the original bug to affect 4.7 version. Will use this BZ to target 4.9z backport (and will create more for 4.8z, and 4.7z when the time comes).

Comment 4 Patryk Diak 2021-09-17 09:27:47 UTC
*** Bug 2003543 has been marked as a duplicate of this bug. ***

Comment 8 Mike Fiedler 2021-10-19 13:37:43 UTC
Verified with 120 node cluster-bot cluster from this PR and node-density workload.   All pods come to Running state and no FailedCreatePodSandbox events for reason "failed to get pod annotation: timed out waiting for annotations"

Comment 9 Mike Fiedler 2021-10-19 17:34:07 UTC
Apologies, cluster was misconfigured as SDN for comment 8.   Re-tested the same scenario (120 node cluster-bot cluster built from this PR, node-density workload) and received many ( over 50K) annotation timeout events.  I will attach must-gather

node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb   3m7s        Warning   FailedCreatePodSandBox    pod/node-density-25413                               Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_node-density-25413_node-d
ensity-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb_567442f4-cf2d-4e6a-910c-e6ee59547e4a_0(7b816ff190ffd374bf23ffdad4e86c1737f36696bd1d97632768ff83255cdfb1): error adding pod node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb_node-density-25413 to CNI network "multus-cni-network": [node-den
sity-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413 7b816ff190ffd374bf23ffdad4e86c1737f36696bd1d9
7632768ff83255cdfb1] [node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413 7b816ff190ffd374bf23ffdad4e86c1737f36696bd1d97632768ff83255cdfb1] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded                                   
node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb   112s        Warning   FailedCreatePodSandBox    pod/node-density-25413                               Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_node-density-25413_node-d
ensity-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb_567442f4-cf2d-4e6a-910c-e6ee59547e4a_0(aa9ba638fd5c8422177855ded1012512267fb61003a02113154e7a4de1bfea98): error adding pod node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb_node-density-25413 to CNI network "multus-cni-network": [node-den
sity-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413 aa9ba638fd5c8422177855ded1012512267fb61003a02
113154e7a4de1bfea98] [node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413 aa9ba638fd5c8422177855ded1012512267fb61003a02113154e7a4de1bfea98] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded

Comment 11 Mike Fiedler 2021-10-19 20:22:08 UTC
Verified I do not see the issue on 4.10.0-0.nightly-2021-10-16-173656.  I will try testing the 4.8 version of the patch next.

Comment 13 Mike Fiedler 2021-11-04 20:53:58 UTC
re-tested on 4.9 cluster built from https://github.com/openshift/ovn-kubernetes/pull/778   using the workload from https://bugzilla.redhat.com/show_bug.cgi?id=2014332#c6.   No annotation timeouts

Comment 16 Mike Fiedler 2021-11-05 11:49:43 UTC
Verified in 4.9.0-0.nightly-2021-11-04-235332 on AWS using the workload in https://bugzilla.redhat.com/show_bug.cgi?id=2014332#c6

Comment 19 errata-xmlrpc 2021-11-16 06:24:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.7 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4579