Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1942169

Summary: Azure: API becomes unavailable during e2e run
Product: OpenShift Container Platform Reporter: Michael Gugino <mgugino>
Component: kube-apiserverAssignee: Lukasz Szaszkiewicz <lszaszki>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Ke Wang <kewang>
Severity: unspecified Docs Contact:
Priority: low    
Version: 4.8CC: aos-bugs, mfojtik, xxia
Target Milestone: ---Flags: mfojtik: needinfo?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: LifecycleStale
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-05 08:42:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Gugino 2021-03-23 19:24:56 UTC
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.8/1372002364097040384


This problem was somewhat hard to find.

Here, we can see kube-apiserver-operator lost leader election:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.8/1372002364097040384/artifacts/e2e-azure-serial/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-84ffbd7975-cg6h8_kube-apiserver-operator_previous.log

E0317 02:19:38.335035       1 leaderelection.go:325] error retrieving resource lock openshift-kube-apiserver-operator/kube-apiserver-operator-lock: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver-operator/configmaps/kube-apiserver-operator-lock?timeout=35s": context deadline exceeded
I0317 02:19:38.335133       1 leaderelection.go:278] failed to renew lease openshift-kube-apiserver-operator/kube-apiserver-operator-lock: timed out waiting for the condition
E0317 02:19:38.335218       1 leaderelection.go:301] Failed to release lock: resource name may not be empty
W0317 02:19:38.335317       1 leaderelection.go:75] leader election lost


machine-api containers also lost leader election just prior to that:

I0317 02:17:52.625155       1 leaderelection.go:278] failed to renew lease openshift-machine-api/cluster-api-provider-machineset-leader: timed out waiting for the condition
2021/03/17 02:17:52 leader election lost

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.8/1372002364097040384/artifacts/e2e-azure-serial/pods/openshift-machine-api_machine-api-controllers-c79779b99-h98r5_machine-healthcheck-controller_previous.log


These are just a couple examples of things that lost leader election.

Here, the ingress operator complains about 'cache failed to sync'

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.8/1372002364097040384/artifacts/e2e-azure-serial/pods/openshift-ingress-operator_ingress-operator-5b9f949b4-qzq2z_ingress-operator_previous.log

Comment 1 Lukasz Szaszkiewicz 2021-04-09 14:48:14 UTC
Sorry, I haven't looked into this BZ yet. I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 2 Michal Fojtik 2021-05-09 15:14:31 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 3 Lukasz Szaszkiewicz 2021-05-10 08:20:16 UTC
Sorry, I haven't looked into this BZ yet. I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 4 Michal Fojtik 2021-05-10 09:14:42 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 5 Lukasz Szaszkiewicz 2021-05-24 07:39:07 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 6 Michal Fojtik 2021-06-09 10:03:39 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 7 Lukasz Szaszkiewicz 2021-06-11 11:57:59 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 8 Lukasz Szaszkiewicz 2021-07-05 12:29:21 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 9 Lukasz Szaszkiewicz 2021-09-03 13:40:58 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 10 Lukasz Szaszkiewicz 2021-11-05 08:42:55 UTC
I'm closing this BZ. Didn't have time to look over the logs and now they are gone.