Hide Forgot
Description of problem: When strategy changes, descheduler-operator can not update configmap, indeed it delete configmap directly. And 8 minutes later,the configmap generate again. Version-Release number of selected component (if applicable): oc v4.0.0-0.95.0 kubernetes v1.11.0+8afe8f3cf9 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-7-155.ec2.internal:8443 openshift v4.0.0-0.95.0 kubernetes v1.11.0+8afe8f3cf9 How reproducible: always Steps to Reproduce: 1.download code from github : https://github.com/openshift/descheduler-operator 2.deploy descheduler-operator, step as follows: #oc create -f deploy/namespace.yaml #oc project openshift-descheduler-operator #oc create -f deploy/crds/descheduler_v1alpha1_descheduler_crd.yaml #oc create -f deploy/service_account.yaml #oc create -f deploy/rbac.yaml #oc create -f deploy/operator.yaml #oc create -f deploy/crds/descheduler_v1alpha1_descheduler_cr.yaml and descheduler_v1alpha1_descheduler_cr.yaml is like: apiVersion: descheduler.io/v1alpha1 kind: Descheduler metadata: name: example-descheduler-1 spec: schedule: "*/1 * * * ?" strategies: - name: "lownodeutilization" params: - name: "cputhreshold" value: "10" - name: "memorythreshold" value: "20" - name: "memorytargetthreshold" value: "40" - name: "duplicates" 3.check configmap #oc describe cm example-descheduler-1 Name: example-descheduler-1 Namespace: openshift-descheduler-operator Labels: <none> Annotations: <none> Data ==== policy.yaml: ---- apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 40 numberOfNodes: 0 "RemoveDuplicates": enabled: true Events: <none> 4.modify strategy of deschedulers # oc edit deschedulers.descheduler.io example-descheduler-1 change is like: apiVersion: descheduler.io/v1alpha1 kind: Descheduler metadata: name: example-descheduler-1 spec: schedule: "*/1 * * * ?" strategies: - name: "lownodeutilization" params: - name: "cputhreshold" value: "10" - name: "memorythreshold" value: "20" - name: "memorytargetthreshold" value: "40" (change 40 to 30) - name: "duplicates" (delete this line) 5.check configmap whether updated or not Actual results: 5.no cm found. #oc get cm No resources found. Expected results: 5.cm update correctly Additional info: descheduler-operators logs show: # oc logs descheduler-operator-965cb8f7f-kdkk8 ... strategies: "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0 2018/12/12 12:23:26 Strategy mismatch in configmap. Delete it 2018/12/12 12:23:26 Inside generated descheduler job And 8 minutes later, the up to date cm generate again. I think the cm should update immediately when strategy changed.
By default, we are not making aggressive reconcile loops. I can make it frequent but I believe, this becomes too much aggressive.
the deletion of configmap also block the start of descheduler job pod. It's more serious than becoming too much aggressive. @ravig [root@ip-172-18-12-194 ~]# oc get pod NAME READY STATUS RESTARTS AGE descheduler-operator-965cb8f7f-jb49v 1/1 Running 0 31m example-descheduler-1-1545032400-5546r 0/1 ContainerCreating 0 12m example-descheduler-1-1545032520-zp5qd 0/1 ContainerCreating 0 10m #oc describe pod example-descheduler-1-1545032400-5546r ..... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 1m default-scheduler Successfully assigned openshift-descheduler-operator/example-descheduler-1-1545032880-258km to ip-172-18-7-162.ec2.internal Warning FailedMount 21s (x8 over 1m) kubelet, ip-172-18-7-162.ec2.internal MountVolume.SetUp failed for volume "policy-volume" : configmaps "example-descheduler-1" not found
https://github.com/openshift/descheduler-operator/pull/37 The above PR should have fixed it.
verified! Version info: oc v4.0.0-alpha.0+85a0623-808 kubernetes v1.11.0+85a0623 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://wsun-qe-api.origin-ci-int-aws.dev.rhcloud.com:6443 kubernetes v1.11.0+85a0623
this problem reproduce in recent version, but has different phenomenons. Pls @ravig check it. Version info: oc v4.0.0-0.123.0 kubernetes v1.11.0+4d56dbaf21 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-5-72.ec2.internal:8443 openshift v4.0.0-0.123.0 kubernetes v1.11.0+4d56dbaf21 phenomenon: when update configmap, it shows update succ immediately. After about 8 minutes, regenerate a new configmap, and it recover the one which before update. And the log of descheduler-operator pod also show the old policy-strategy. logs: 2019/01/04 06:57:36 Creating descheduler job 2019/01/04 06:57:36 Validating descheduler flags 2019/01/04 06:57:36 Creating a new cron job openshift-descheduler-operator/example-descheduler-1 =================================================================================================(time after update) 2019/01/04 07:13:19 Reconciling Descheduler openshift-descheduler-operator/example-descheduler-1 2019/01/04 07:13:19 cputhreshold 10 2019/01/04 07:13:19 memorythreshold 20 2019/01/04 07:13:19 memorytargetthreshold 30 2019/01/04 07:13:19 apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "nodeaffinity": enabled: true, apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0 2019/01/04 07:13:19 Strategy mismatch in configmap. Delete it 2019/01/04 07:13:20 Validating descheduler flags 2019/01/04 07:13:20 Flags mismatch for descheduler. Delete cronjob 2019/01/04 07:13:20 Reconciling Descheduler openshift-descheduler-operator/example-descheduler-1 2019/01/04 07:13:20 Creating config map 2019/01/04 07:13:20 cputhreshold 10 2019/01/04 07:13:20 memorythreshold 20 2019/01/04 07:13:20 memorytargetthreshold 30 2019/01/04 07:13:20 "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0 2019/01/04 07:13:20 Creating a new configmap openshift-descheduler-operator/example-descheduler-1 2019/01/04 07:13:20 Validating descheduler flags 2019/01/04 07:13:20 Flags mismatch for descheduler. Delete cronjob 2019/01/04 07:13:20 Error while deleting cronjob 2019/01/04 07:13:21 Reconciling Descheduler openshift-descheduler-operator/example-descheduler-1 2019/01/04 07:13:21 cputhreshold 10 2019/01/04 07:13:21 memorythreshold 20 2019/01/04 07:13:21 memorytargetthreshold 30 2019/01/04 07:13:21 apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0, apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0 2019/01/04 07:13:21 Creating descheduler job 2019/01/04 07:13:21 Validating descheduler flags 2019/01/04 07:13:21 Creating a new cron job openshift-descheduler-operator/example-descheduler-1
@ravig, I think it's my steps not appropriate. I directly update cm , not via CR, so the cm did not update succ indeed. Really Sorry for the inconvenience. May I change the bug status to "verified"?
No problem, please go ahead and modify the status.
verified! version info: oc v4.0.0-0.130.0 kubernetes v1.11.0+f67f40dbad features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-10-163.ec2.internal:8443 openshift v4.0.0-0.130.0 kubernetes v1.11.0+f67f40dbad
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758