Bug 1997072 - [scale] [4.9z] failed to get pod annotation: timed out waiting for annotations
Summary: [scale] [4.9z] failed to get pod annotation: timed out waiting for annotations
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: All
OS: All
high
medium
Target Milestone: ---
: 4.9.z
Assignee: Tim Rozet
QA Contact: Mike Fiedler
URL:
Whiteboard: perfscale-ovn
: 2003543 (view as bug list)
Depends On: 1959352 2076201
Blocks: 2014332
TreeView+ depends on / blocked
 
Reported: 2021-08-24 11:23 UTC by Dustin Black
Modified: 2022-04-18 08:04 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1959352
: 2014332 (view as bug list)
Environment:
Last Closed: 2021-11-16 06:24:16 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 778 0 None open Bug 1997072: [4.9] phase 2 scale improvements 2021-10-01 19:46:19 UTC
Red Hat Product Errata RHBA-2021:4579 0 None None None 2021-11-16 06:24:38 UTC

Comment 1 Tim Rozet 2021-08-25 14:34:31 UTC
Hey Dustin, I see you cloned this from 1959352, is this meant to target 4.8z?

Comment 2 Dustin Black 2021-09-03 14:37:13 UTC
(In reply to Tim Rozet from comment #1)
> Hey Dustin, I see you cloned this from 1959352, is this meant to target 4.8z?

BZ 1959352 was reported on 4.8, but we also experienced this problem on 4.7 and we need to ensure there is a backport tracker for that stream.

Comment 3 Tim Rozet 2021-09-10 16:27:30 UTC
I updated the original bug to affect 4.7 version. Will use this BZ to target 4.9z backport (and will create more for 4.8z, and 4.7z when the time comes).

Comment 4 Patryk Diak 2021-09-17 09:27:47 UTC
*** Bug 2003543 has been marked as a duplicate of this bug. ***

Comment 8 Mike Fiedler 2021-10-19 13:37:43 UTC
Verified with 120 node cluster-bot cluster from this PR and node-density workload.   All pods come to Running state and no FailedCreatePodSandbox events for reason "failed to get pod annotation: timed out waiting for annotations"

Comment 9 Mike Fiedler 2021-10-19 17:34:07 UTC
Apologies, cluster was misconfigured as SDN for comment 8.   Re-tested the same scenario (120 node cluster-bot cluster built from this PR, node-density workload) and received many ( over 50K) annotation timeout events.  I will attach must-gather

node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb   3m7s        Warning   FailedCreatePodSandBox    pod/node-density-25413                               Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_node-density-25413_node-d
ensity-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb_567442f4-cf2d-4e6a-910c-e6ee59547e4a_0(7b816ff190ffd374bf23ffdad4e86c1737f36696bd1d97632768ff83255cdfb1): error adding pod node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb_node-density-25413 to CNI network "multus-cni-network": [node-den
sity-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413 7b816ff190ffd374bf23ffdad4e86c1737f36696bd1d9
7632768ff83255cdfb1] [node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413 7b816ff190ffd374bf23ffdad4e86c1737f36696bd1d97632768ff83255cdfb1] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded                                   
node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb   112s        Warning   FailedCreatePodSandBox    pod/node-density-25413                               Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_node-density-25413_node-d
ensity-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb_567442f4-cf2d-4e6a-910c-e6ee59547e4a_0(aa9ba638fd5c8422177855ded1012512267fb61003a02113154e7a4de1bfea98): error adding pod node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb_node-density-25413 to CNI network "multus-cni-network": [node-den
sity-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413 aa9ba638fd5c8422177855ded1012512267fb61003a02
113154e7a4de1bfea98] [node-density-010fc5c2-b861-47d3-b9c1-4bb4bb0cbdfb/node-density-25413 aa9ba638fd5c8422177855ded1012512267fb61003a02113154e7a4de1bfea98] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded

Comment 11 Mike Fiedler 2021-10-19 20:22:08 UTC
Verified I do not see the issue on 4.10.0-0.nightly-2021-10-16-173656.  I will try testing the 4.8 version of the patch next.

Comment 13 Mike Fiedler 2021-11-04 20:53:58 UTC
re-tested on 4.9 cluster built from https://github.com/openshift/ovn-kubernetes/pull/778   using the workload from https://bugzilla.redhat.com/show_bug.cgi?id=2014332#c6.   No annotation timeouts

Comment 16 Mike Fiedler 2021-11-05 11:49:43 UTC
Verified in 4.9.0-0.nightly-2021-11-04-235332 on AWS using the workload in https://bugzilla.redhat.com/show_bug.cgi?id=2014332#c6

Comment 19 errata-xmlrpc 2021-11-16 06:24:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.7 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4579


Note You need to log in before you can comment on or make changes to this bug.