1979916 – kube-apiserver constantly receiving signals to terminate after a fresh install, but still keeps serving

Bug 1979916 - kube-apiserver constantly receiving signals to terminate after a fresh install, but still keeps serving

Summary: kube-apiserver constantly receiving signals to terminate after a fresh instal...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Stefan Schimanski
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:	EmergencyRequest
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-07 12:01 UTC by Udi Kalifon
Modified:	2021-07-12 18:33 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-12 11:17:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Udi Kalifon 2021-07-07 12:01:43 UTC

Description of problem:
I installed a fresh new cluster with the assisted installer: 3 masters + 3 workers + LSO + OCS

When I logged into the console I could see many events firing, and they keep on firing rapidly (several every minute) in an infinite loop:

kube-apiserver-master-0-2
Received signal to terminate, becoming unready, but keeping serving

When I click on the event I reach a page with the pod details, and then when I open the "Events" tab there are no events streamed there for some reason. Also from the CLI, I can't see these events in any namespace or when I do "oc describe" on the pod.

Together with this event there are also many others (and they repeat for all 3 masters):

kube-apiserver-master-0-2
All pending requests processed

kube-apiserver-master-0-2
All pre-shutdown hooks have been finished

kube-apiserver-master-0-2
Server has stopped listening

kube-apiserver-master-0-2
The minimal shutdown duration of 1m10s finished

redhat-operators-7p4nb
Stopping container registry-server

Successfully pulled image "registry.redhat.io/redhat/redhat-operator-index:v4.8" in 3.09180991s

redhat-operators-7p4nb
Started container registry-server

redhat-operators-7p4nb
Created container registry-server

redhat-operators-7p4nb
Pulling image "registry.redhat.io/redhat/redhat-operator-index:v4.8"

redhat-operators-7p4nb
Add eth0 [10.130.0.98/23] from openshift-sdn

keepalived-master-0-0
Liveness probe failed: command timed out


Version-Release number of selected component (if applicable):
4.8.0-rc.0


How reproducible:
Unknown


Steps to Reproduce:
1. Install a cluster with the assisted installer:  masters + 3 workers + LSO + OCS
2. Log in to the openshift console (GUI)


Actual results:
Many events firing. The events are only visible in the GUI and I wan't able to see them in the CLI anywhere.


Additional info:
I made sure that OCS is healthy, and I also deployed a sample application and everything seems to work fine. It's just all these alerts that fire and I don't know what's causing them.

Comment 2 Michal Fojtik 2021-07-07 12:13:28 UTC

** WARNING **

This BZ claims that this bug is of urgent severity and priority. Note that urgent priority means that you just declared emergency within engineering. 
Engineers are asked to stop whatever they are doing, including putting important release work on hold, potentially risking the release of OCP while working on this case.

Be prepared to have a good justification ready and your own and engineering management are aware and has approved this. Urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was assigned to engineering manager with severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 3 Stefan Schimanski 2021-07-12 11:17:54 UTC

`oc get events -n <namespace>` shows events on the CLI.

Please reopen if that does not work, component `oc`.

Comment 4 Udi Kalifon 2021-07-12 18:33:48 UTC

I managed to trace down some of the events, which are coming from openshift-marketplace. I reported it here: https://bugzilla.redhat.com/show_bug.cgi?id=1981532

Note You need to log in before you can comment on or make changes to this bug.