Bug 1902019 - when podTopologySpreadConstraint strategy is enabled for descheduler it throws error
Summary: when podTopologySpreadConstraint strategy is enabled for descheduler it throw...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Jan Chaloupka
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-26 15:25 UTC by RamaKasturi
Modified: 2021-02-24 15:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:36:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-descheduler-operator pull 155 0 None closed bug 1902019: CSV: fix rbac for namespaces 2021-01-08 08:52:32 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:36:52 UTC

Description RamaKasturi 2020-11-26 15:25:10 UTC
Description of problem:
When podTopologySpreadConstraint strategy is enabled for descheduler it throws error as below

E1126 15:21:10.337549       1 topologyspreadconstraint.go:106] "Couldn't list namespaces" err="namespaces is forbidden: User \"system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler\" cannot list resource \"namespaces\" in API group \"\" at the cluster scope"

Version-Release number of selected component (if applicable):
[knarra@knarra verification-tests]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                   DISPLAY                     VERSION                 REPLACES   PHASE
clusterkubedescheduleroperator.4.7.0-202011240542.p0   Kube Descheduler Operator   4.7.0-202011240542.p0              Succeeded


How reproducible:
Always

Steps to Reproduce:
1. Install latest 4.7 cluster
2. create policy.cfg as shown below
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "RemovePodsViolatingTopologySpreadConstraint":
     enabled: true

3. create configmap using the command "oc create configmap --from-file=policy.cfg descheduler-policy -n openshift-kube-descheduler-operator"
4. check cluster pod logs to see if the strategy is enabled.

Actual results:
cluster log shows error as below
E1126 15:21:10.337549       1 topologyspreadconstraint.go:106] "Couldn't list namespaces" err="namespaces is forbidden: User \"system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler\" cannot list resource \"namespaces\" in API group \"\" at the cluster scope"


Expected results:
cluster logs should not throw any error when podTopologySpreadConstraint is enabled.

Additional info:
Enabled RemovePodsViolatingInterPodAntiAffinity strategy and do not see any such errors.

[knarra@knarra verification-tests]$ oc logs -f cluster-75986fc4cd-pc6v4 -n openshift-kube-descheduler-operator
E1126 15:19:37.026755       1 server.go:50] "failed to validate server configuration" err="unsupported log format: "
I1126 15:19:37.131627       1 node.go:46] "Node lister returned empty list, now fetch directly"
I1126 15:19:37.219802       1 pod_antiaffinity.go:72] "Processing node" node="ip-10-0-151-196.us-east-2.compute.internal"
I1126 15:19:37.424290       1 pod_antiaffinity.go:72] "Processing node" node="ip-10-0-155-5.us-east-2.compute.internal"
I1126 15:19:37.525074       1 pod_antiaffinity.go:72] "Processing node" node="ip-10-0-165-213.us-east-2.compute.internal"
I1126 15:19:37.723996       1 pod_antiaffinity.go:72] "Processing node" node="ip-10-0-181-80.us-east-2.compute.internal"
I1126 15:19:37.824916       1 pod_antiaffinity.go:72] "Processing node" node="ip-10-0-193-210.us-east-2.compute.internal"
I1126 15:19:38.020387       1 pod_antiaffinity.go:72] "Processing node" node="ip-10-0-212-237.us-east-2.compute.internal"

Comment 2 RamaKasturi 2020-11-27 12:03:52 UTC
Do not have latest image available, will verify bug once it is available.

Comment 3 RamaKasturi 2020-12-01 15:34:26 UTC
Verified with the payload and csv below and do not see any such error as reported when podTopologySpreadConstraint is enabled.

[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2020-11-30-172451]$ ./oc get csv
NAME                                                   DISPLAY                     VERSION                 REPLACES   PHASE
clusterkubedescheduleroperator.4.7.0-202011261728.p0   Kube Descheduler Operator   4.7.0-202011261728.p0              Succeeded

[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2020-11-30-172451]$ ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-30-172451   True        False         5h9m    Cluster version is 4.7.0-0.nightly-2020-11-30-172451


[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2020-11-30-172451]$ ./oc get configmap descheduler-policy -o yaml
apiVersion: v1
data:
  policy.cfg: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "RemovePodsViolatingTopologySpreadConstraint":
         enabled: true
kind: ConfigMap
metadata:
  creationTimestamp: "2020-12-01T13:40:04Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:policy.cfg: {}
    manager: kubectl-create
    operation: Update
    time: "2020-12-01T13:40:04Z"
  name: descheduler-policy
  namespace: openshift-kube-descheduler-operator
  resourceVersion: "97759"
  selfLink: /api/v1/namespaces/openshift-kube-descheduler-operator/configmaps/descheduler-policy
  uid: 08ac817a-0778-4a67-b6c5-d986c4ef032b
[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2020-11-30-172451]$ ./oc logs -f cluster-7fdcffcd8b-nscjf
E1201 13:45:16.817571       1 server.go:50] "failed to validate server configuration" err="unsupported log format: "
I1201 13:45:16.921881       1 node.go:46] "Node lister returned empty list, now fetch directly"
I1201 13:46:27.524646       1 node.go:46] "Node lister returned empty list, now fetch directly"
I1201 13:47:38.127092       1 node.go:46] "Node lister returned empty list, now fetch directly"
I1201 13:48:48.730865       1 node.go:46] "Node lister returned empty list, now fetch directly"
I1201 13:49:59.334811       1 node.go:46] "Node lister returned empty list, now fetch directly"
I1201 13:51:09.939033       1 node.go:46] "Node lister returned empty list, now fetch directly"

Based on the above moving bug to verified state.

Comment 6 errata-xmlrpc 2021-02-24 15:36:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.