Bug 1722835 - Kube-scheduler broken on upgrade to 4.1.2
Summary: Kube-scheduler broken on upgrade to 4.1.2
Keywords:
Status: CLOSED DUPLICATE of bug 1721566
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: ravig
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-21 12:41 UTC by Naveen Malik
Modified: 2019-06-24 17:28 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-24 17:28:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
clusterversion (2.54 KB, text/plain)
2019-06-21 12:42 UTC, Naveen Malik
no flags Details
rolebinding (603 bytes, text/plain)
2019-06-21 12:42 UTC, Naveen Malik
no flags Details
role (463 bytes, text/plain)
2019-06-21 12:42 UTC, Naveen Malik
no flags Details
who-can (3.46 KB, text/plain)
2019-06-21 12:43 UTC, Naveen Malik
no flags Details
pod logs (52.91 KB, text/plain)
2019-06-21 12:44 UTC, Naveen Malik
no flags Details

Description Naveen Malik 2019-06-21 12:41:32 UTC
Description of problem:
Cluster upgraded from 4.1.0-rc.7 through to 4.1.2
Cluster reports upgrade is progressing with status "Unable to apply 4.1.2: the cluster operator kube-scheduler is degraded"
Review of kube-scheduler pods indicate RBAC issues, though user in question does appear to have permissions that are reported as missing.

This is on a long running cluster that was provisioned on 4.1.0-rc.7 on 2019-05-30.  Did not observer any issues with another cluster managed in the same way, upgraded from 4.1.0-rc.7 through to 4.1.2 on the same schedule.

Version-Release number of selected component (if applicable):
OCP 4.1.2

How reproducible:
One cluster upgrade

Steps to Reproduce:
1. Provision OCP 4.1.0-rc.7
2. Upgrade to 4.1.0-rc.9
3. Upgrade to 4.1.0
4. Upgrade to 4.1.2

Actual results:
Unable to complete upgrade to 4.1.2


Expected results:
Kube scheduler in good state on upgrade.


Additional info:
See attachments for logs and CR's.  Happy to provide more as needed.

Comment 1 Naveen Malik 2019-06-21 12:42:07 UTC
Created attachment 1583178 [details]
clusterversion

Comment 2 Naveen Malik 2019-06-21 12:42:26 UTC
Created attachment 1583179 [details]
rolebinding

Comment 3 Naveen Malik 2019-06-21 12:42:41 UTC
Created attachment 1583180 [details]
role

Comment 4 Naveen Malik 2019-06-21 12:43:10 UTC
Created attachment 1583181 [details]
who-can

Comment 5 Naveen Malik 2019-06-21 12:44:00 UTC
Created attachment 1583182 [details]
pod logs

I picked on the configmap access in the openshift-kube-scheduler namespace to dig into, hence the other RBAC related attachments.

Comment 6 David Eads 2019-06-24 13:04:13 UTC
Can you provide the output archive from `oc adm must-gather`?  It will include additional operator related information for us to debug.

Comment 7 ravig 2019-06-24 14:24:31 UTC
I think the underlying issue here is kube-scheduler not able to communicate with api-server.

`Failed to watch *v1.PersistentVolumeClaim: Get https://localhost:6443/api/v1/persistentvolumeclaims?resourceVersion=9463143&timeout=8m1s&timeoutSeconds=481&watch=true: dial tcp [::1]:6443: connect: connection refused`

Do you have logs from other scheduler pods on the remaining 2 master nodes and yes you can get that information from `oc adm must-gather` as David mentioned.

Comment 9 David Eads 2019-06-24 17:28:04 UTC
Thanks for the update.  Based on this we can clusteroperator/kube-scheduler and the kubescheduler.operator.openshift.io/cluster and we see it's a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1721566.

*** This bug has been marked as a duplicate of bug 1721566 ***


Note You need to log in before you can comment on or make changes to this bug.