Bug 2106216

Summary: Cluster upgrade.[sig-network-edge] Verify DNS availability during and after upgrade success
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: NetworkingAssignee: Suleyman Akbas <sakbas>
Networking sub component: DNS QA Contact: Hongan Li <hongli>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: medium    
Priority: medium CC: mmasters, sakbas, sippy
Version: 4.11Keywords: Upgrades
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-31 10:37:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2022-07-12 04:39:00 UTC
Cluster upgrade.[sig-network-edge] Verify DNS availability during and after upgrade success

is failing frequently in CI, see:
https://sippy.ci.openshift.org/sippy-ng/tests/4.12/analysis?test=Cluster%20upgrade.%5Bsig-network-edge%5D%20Verify%20DNS%20availability%20during%20and%20after%20upgrade%20success

and:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?type=junit&maxAge=24h&search=success+rate+is+less+than+99.+on+the+node' | grep 'failures match' | sort
periodic-ci-openshift-multiarch-master-nightly-4.12-upgrade-from-stable-4.11-ocp-e2e-aws-arm64 (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade (all) - 20 runs, 20% failed, 25% of failures match = 5% impact
periodic-ci-openshift-release-master-nightly-4.12-upgrade-from-stable-4.11-e2e-aws-sdn-upgrade (all) - 4 runs, 50% failed, 50% of failures match = 25% impact
pull-ci-openshift-cluster-monitoring-operator-master-e2e-agnostic-upgrade (all) - 6 runs, 50% failed, 33% of failures match = 17% impact
release-openshift-origin-installer-e2e-aws-upgrade (all) - 2 runs, 50% failed, 100% of failures match = 50% impact

Looks like it impacts 4.11 and 4.12.  Although [1] is still looking pretty happy for this test case.  The 4.11.0-rc.1 to rc.2 update [2] hit this:

fail [github.com/openshift/origin/test/e2e/upgrade/dns/dns.go:138]: Unexpected error:
    <*errors.errorString | 0xc0021e5200>: {
        s: "success rate is less than 99% on the node ip-10-0-157-24.us-east-2.compute.internal: [98.39]",

which sounds like a failure for:

  disruption_tests: [sig-network-edge] Verify DNS availability during and after upgrade success

even though it was reported under:

  : [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]

and [3] shows some failures too, although without much deep history.

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1546606843319554048
[3]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-upgrade

Comment 1 Miciah Dashiel Butler Masters 2022-07-12 14:19:00 UTC
Right now, Sippy is showing that the success rate has dropped from 98.9% to 98.4%.  This is a relatively new test, so we don't have a lot of historical data on how much error this is.  So I'm marking this BZ a blocker-, and the team can prioritize after Shift Week.