1722835 – Kube-scheduler broken on upgrade to 4.1.2

Bug 1722835 - Kube-scheduler broken on upgrade to 4.1.2

Summary: Kube-scheduler broken on upgrade to 4.1.2

Keywords:
Status:	CLOSED DUPLICATE of bug 1721566
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	ravig
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-06-21 12:41 UTC by Naveen Malik
Modified:	2019-06-24 17:28 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-24 17:28:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
clusterversion (2.54 KB, text/plain) 2019-06-21 12:42 UTC, Naveen Malik	no flags	Details
rolebinding (603 bytes, text/plain) 2019-06-21 12:42 UTC, Naveen Malik	no flags	Details
role (463 bytes, text/plain) 2019-06-21 12:42 UTC, Naveen Malik	no flags	Details
who-can (3.46 KB, text/plain) 2019-06-21 12:43 UTC, Naveen Malik	no flags	Details
pod logs (52.91 KB, text/plain) 2019-06-21 12:44 UTC, Naveen Malik	no flags	Details
View All

Description Naveen Malik 2019-06-21 12:41:32 UTC

Description of problem:
Cluster upgraded from 4.1.0-rc.7 through to 4.1.2
Cluster reports upgrade is progressing with status "Unable to apply 4.1.2: the cluster operator kube-scheduler is degraded"
Review of kube-scheduler pods indicate RBAC issues, though user in question does appear to have permissions that are reported as missing.

This is on a long running cluster that was provisioned on 4.1.0-rc.7 on 2019-05-30.  Did not observer any issues with another cluster managed in the same way, upgraded from 4.1.0-rc.7 through to 4.1.2 on the same schedule.

Version-Release number of selected component (if applicable):
OCP 4.1.2

How reproducible:
One cluster upgrade

Steps to Reproduce:
1. Provision OCP 4.1.0-rc.7
2. Upgrade to 4.1.0-rc.9
3. Upgrade to 4.1.0
4. Upgrade to 4.1.2

Actual results:
Unable to complete upgrade to 4.1.2


Expected results:
Kube scheduler in good state on upgrade.


Additional info:
See attachments for logs and CR's.  Happy to provide more as needed.

Comment 1 Naveen Malik 2019-06-21 12:42:07 UTC

Created attachment 1583178 [details]
clusterversion

Comment 2 Naveen Malik 2019-06-21 12:42:26 UTC

Created attachment 1583179 [details]
rolebinding

Comment 3 Naveen Malik 2019-06-21 12:42:41 UTC

Created attachment 1583180 [details]
role

Comment 4 Naveen Malik 2019-06-21 12:43:10 UTC

Created attachment 1583181 [details]
who-can

Comment 5 Naveen Malik 2019-06-21 12:44:00 UTC

Created attachment 1583182 [details]
pod logs

I picked on the configmap access in the openshift-kube-scheduler namespace to dig into, hence the other RBAC related attachments.

Comment 6 David Eads 2019-06-24 13:04:13 UTC

Can you provide the output archive from `oc adm must-gather`?  It will include additional operator related information for us to debug.

Comment 7 ravig 2019-06-24 14:24:31 UTC

I think the underlying issue here is kube-scheduler not able to communicate with api-server.

`Failed to watch *v1.PersistentVolumeClaim: Get https://localhost:6443/api/v1/persistentvolumeclaims?resourceVersion=9463143&timeout=8m1s&timeoutSeconds=481&watch=true: dial tcp [::1]:6443: connect: connection refused`

Do you have logs from other scheduler pods on the remaining 2 master nodes and yes you can get that information from `oc adm must-gather` as David mentioned.

Comment 9 David Eads 2019-06-24 17:28:04 UTC

Thanks for the update.  Based on this we can clusteroperator/kube-scheduler and the kubescheduler.operator.openshift.io/cluster and we see it's a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1721566.

*** This bug has been marked as a duplicate of bug 1721566 ***

Note You need to log in before you can comment on or make changes to this bug.