Bug 1736795 - Remove scheduler-policy then recover name to original "" in scheduler/cluster still result in scheduler pod CrashLoopBackOff
Summary: Remove scheduler-policy then recover name to original "" in scheduler/cluster...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.2.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.2.0
Assignee: Mike Dame
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-02 03:36 UTC by ge liu
Modified: 2019-10-16 06:34 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:34:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-scheduler-operator pull 164 0 None closed Bug 1736795: Add check for removal of policy configmap 2020-04-23 14:31:22 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:34:39 UTC

Comment 1 Xingxing Xia 2019-08-02 05:58:29 UTC
oc edit kubescheduler cluster
spec:
  logLevel: ""
  managementState: Managed
  observedConfig:
    algorithmSource:
      policy:
        configMap:
          name: policy-configmap
          namespace: openshift-kube-scheduler

Remove observedConfig, save and exit, scheduler pod will recover to Running. But from above description, user customizes policy via `oc edit scheduler cluster`, so should user un-customize policy via `oc edit scheduler cluster` to name: ""

Comment 2 Mike Dame 2019-08-08 13:28:40 UTC
I've started diagnosing this in https://github.com/openshift/cluster-kube-scheduler-operator/pull/164

Comment 3 Mike Dame 2019-08-08 16:30:19 UTC
There are a couple issues here, as I described in my pr but:
1. Kube-scheduler-operator doesn't act on an empty Spec.Policy.Name field, it only checks for it then throws it away if it is empty.
2. The generic operator client in library-go currently has a bug that doesn't account for removing fields from observed config. This is a big cause of our problem but, due to issue 1, not an exact duplicate.

We can fix 1, but it won't have any effect until 2 is resolved, so until then we depend on that bug

Comment 7 Mike Dame 2019-08-19 13:53:52 UTC
Now that https://bugzilla.redhat.com/show_bug.cgi?id=1738432 has been verified this should also be verified

Comment 8 Xingxing Xia 2019-08-23 04:15:30 UTC
Verified in 4.2.0-0.nightly-2019-08-22-043819 , scheduler/cluster can change the name to "" and the pods can terminate and be re-running

Comment 9 errata-xmlrpc 2019-10-16 06:34:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.