Bug 1979916 - kube-apiserver constantly receiving signals to terminate after a fresh install, but still keeps serving
Summary: kube-apiserver constantly receiving signals to terminate after a fresh instal...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.8
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Stefan Schimanski
QA Contact: Ke Wang
Whiteboard: EmergencyRequest
Depends On:
TreeView+ depends on / blocked
Reported: 2021-07-07 12:01 UTC by Udi Kalifon
Modified: 2021-07-12 18:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2021-07-12 11:17:54 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Udi Kalifon 2021-07-07 12:01:43 UTC
Description of problem:
I installed a fresh new cluster with the assisted installer: 3 masters + 3 workers + LSO + OCS

When I logged into the console I could see many events firing, and they keep on firing rapidly (several every minute) in an infinite loop:

Received signal to terminate, becoming unready, but keeping serving

When I click on the event I reach a page with the pod details, and then when I open the "Events" tab there are no events streamed there for some reason. Also from the CLI, I can't see these events in any namespace or when I do "oc describe" on the pod.

Together with this event there are also many others (and they repeat for all 3 masters):

All pending requests processed

All pre-shutdown hooks have been finished

Server has stopped listening

The minimal shutdown duration of 1m10s finished

Stopping container registry-server

Successfully pulled image "registry.redhat.io/redhat/redhat-operator-index:v4.8" in 3.09180991s

Started container registry-server

Created container registry-server

Pulling image "registry.redhat.io/redhat/redhat-operator-index:v4.8"

Add eth0 [] from openshift-sdn

Liveness probe failed: command timed out

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Install a cluster with the assisted installer:  masters + 3 workers + LSO + OCS
2. Log in to the openshift console (GUI)

Actual results:
Many events firing. The events are only visible in the GUI and I wan't able to see them in the CLI anywhere.

Additional info:
I made sure that OCS is healthy, and I also deployed a sample application and everything seems to work fine. It's just all these alerts that fire and I don't know what's causing them.

Comment 2 Michal Fojtik 2021-07-07 12:13:28 UTC

This BZ claims that this bug is of urgent severity and priority. Note that urgent priority means that you just declared emergency within engineering. 
Engineers are asked to stop whatever they are doing, including putting important release work on hold, potentially risking the release of OCP while working on this case.

Be prepared to have a good justification ready and your own and engineering management are aware and has approved this. Urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was assigned to engineering manager with severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 3 Stefan Schimanski 2021-07-12 11:17:54 UTC
`oc get events -n <namespace>` shows events on the CLI.

Please reopen if that does not work, component `oc`.

Comment 4 Udi Kalifon 2021-07-12 18:33:48 UTC
I managed to trace down some of the events, which are coming from openshift-marketplace. I reported it here: https://bugzilla.redhat.com/show_bug.cgi?id=1981532

Note You need to log in before you can comment on or make changes to this bug.