Bug 2081732 - "[sig-node] static pods should start after being created" doesn't capture "... because static pod is ready" event
Summary: "[sig-node] static pods should start after being created" doesn't capture ".....
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Test Framework
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: ---
: 4.11.0
Assignee: Ken Zhang
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-04 13:46 UTC by Riccardo Ravaioli
Modified: 2022-06-13 15:23 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 27143 0 None open Bug 2081732: Dump debug message whenever static pod test fails 2022-05-19 18:43:29 UTC
Github openshift origin pull 27160 0 None open Bug 2081732: Static pod test error 2022-05-23 16:28:10 UTC
Github openshift origin pull 27195 0 None open Bug 2081732: remove incorrect namespace check in static pod test 2022-06-02 12:58:30 UTC

Description Riccardo Ravaioli 2022-05-04 13:46:36 UTC
Description of problem:

The test "[sig-node] static pods should start after being created" parses events and looks for static pods that failed to be created, searching for events like:

"static pod lifecycle failure - static pod: \"openshift-kube-scheduler\" in namespace: \"openshift-kube-scheduler\" for revision: 7 on node: \"ci-op-j83m46vy-5cb9e-xs2c9-master-0\" didn't show up, waited: 2m30s"

Since these static pods might take a little longer to come up, it then looks for events like:

"Updated node \"ci-op-j83m46vy-5cb9e-xs2c9-master-0\" from revision 0 to 7 because static pod is ready"
... and in this case, the test shouldn't fail.

I looked at two failed runs of this test case and it seems that somehow the "... because static pod is ready" event is never captured, leading to test failing while it should not.
I provided a detailed analysis here: https://issues.redhat.com/browse/SDN-2994

Version-Release number of selected component (if applicable):
4.11

How reproducible:
I found and analyzed failed runs through testgrid: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn


Expected results:
The test shouldn't fail

Comment 4 Ken Zhang 2022-06-13 15:23:12 UTC
Test has been fixed.


Note You need to log in before you can comment on or make changes to this bug.