Bug 1886499

Summary: [sig-api-machinery] Kubernetes APIs remain available suddenly started failing on azure
Product: OpenShift Container Platform Reporter: David Eads <deads>
Component: kube-apiserverAssignee: Maru Newby <mnewby>
Status: CLOSED NOTABUG QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: aos-bugs, mfojtik, sttts, wlewis, xxia
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-api-machinery] Kubernetes APIs remain available
Last Closed: 2020-11-17 13:51:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Eads 2020-10-08 15:27:47 UTC
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#release-openshift-origin-installer-e2e-azure-upgrade-4.6 shows a steep drop in pass rate for `[sig-api-machinery] Kubernetes APIs remain available` happening about 9/30.

We went from passing 72% of the time to passing 56% of the time.  There is also a drop in 4.5 to 4.6 upgrades from 71% pass to 66% pass.

There is a clear change in behavior.  Something regressed in our last 7-10 days, around 9/30ish.

Comment 1 David Eads 2020-10-08 19:39:52 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1882750 started failing a lot more often on azure at nearly exactly the same time.

Comment 2 Stefan Schimanski 2020-10-09 10:28:50 UTC
In last 11 runs we only have 2 fails. That's 82% pass-rate.

Comment 4 Maru Newby 2020-10-23 21:17:18 UTC
I'm seeing flakes rather than failures, waiting to see the impact of pending fixes on that flake rate.

Comment 5 Maru Newby 2020-11-13 17:57:54 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 6 David Eads 2020-11-17 13:51:29 UTC
test-grid broke when this was written.  test grid has since been fixed.