Bug 1658582
| Summary: | When strategy changes, descheduler-operator can not update configmap in time | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | MinLi <minmli> |
| Component: | Node | Assignee: | ravig <rgudimet> |
| Status: | CLOSED ERRATA | QA Contact: | Xiaoli Tian <xtian> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.1.0 | CC: | aos-bugs, jokerman, minmli, mmccomas, rgudimet |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:41:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
MinLi
2018-12-12 13:04:36 UTC
By default, we are not making aggressive reconcile loops. I can make it frequent but I believe, this becomes too much aggressive. the deletion of configmap also block the start of descheduler job pod. It's more serious than becoming too much aggressive. @ravig [root@ip-172-18-12-194 ~]# oc get pod NAME READY STATUS RESTARTS AGE descheduler-operator-965cb8f7f-jb49v 1/1 Running 0 31m example-descheduler-1-1545032400-5546r 0/1 ContainerCreating 0 12m example-descheduler-1-1545032520-zp5qd 0/1 ContainerCreating 0 10m #oc describe pod example-descheduler-1-1545032400-5546r ..... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 1m default-scheduler Successfully assigned openshift-descheduler-operator/example-descheduler-1-1545032880-258km to ip-172-18-7-162.ec2.internal Warning FailedMount 21s (x8 over 1m) kubelet, ip-172-18-7-162.ec2.internal MountVolume.SetUp failed for volume "policy-volume" : configmaps "example-descheduler-1" not found https://github.com/openshift/descheduler-operator/pull/37 The above PR should have fixed it. verified! Version info: oc v4.0.0-alpha.0+85a0623-808 kubernetes v1.11.0+85a0623 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://wsun-qe-api.origin-ci-int-aws.dev.rhcloud.com:6443 kubernetes v1.11.0+85a0623 this problem reproduce in recent version, but has different phenomenons. Pls @ravig check it. Version info: oc v4.0.0-0.123.0 kubernetes v1.11.0+4d56dbaf21 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-5-72.ec2.internal:8443 openshift v4.0.0-0.123.0 kubernetes v1.11.0+4d56dbaf21 phenomenon: when update configmap, it shows update succ immediately. After about 8 minutes, regenerate a new configmap, and it recover the one which before update. And the log of descheduler-operator pod also show the old policy-strategy. logs: 2019/01/04 06:57:36 Creating descheduler job 2019/01/04 06:57:36 Validating descheduler flags 2019/01/04 06:57:36 Creating a new cron job openshift-descheduler-operator/example-descheduler-1 =================================================================================================(time after update) 2019/01/04 07:13:19 Reconciling Descheduler openshift-descheduler-operator/example-descheduler-1 2019/01/04 07:13:19 cputhreshold 10 2019/01/04 07:13:19 memorythreshold 20 2019/01/04 07:13:19 memorytargetthreshold 30 2019/01/04 07:13:19 apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "nodeaffinity": enabled: true, apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0 2019/01/04 07:13:19 Strategy mismatch in configmap. Delete it 2019/01/04 07:13:20 Validating descheduler flags 2019/01/04 07:13:20 Flags mismatch for descheduler. Delete cronjob 2019/01/04 07:13:20 Reconciling Descheduler openshift-descheduler-operator/example-descheduler-1 2019/01/04 07:13:20 Creating config map 2019/01/04 07:13:20 cputhreshold 10 2019/01/04 07:13:20 memorythreshold 20 2019/01/04 07:13:20 memorytargetthreshold 30 2019/01/04 07:13:20 "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0 2019/01/04 07:13:20 Creating a new configmap openshift-descheduler-operator/example-descheduler-1 2019/01/04 07:13:20 Validating descheduler flags 2019/01/04 07:13:20 Flags mismatch for descheduler. Delete cronjob 2019/01/04 07:13:20 Error while deleting cronjob 2019/01/04 07:13:21 Reconciling Descheduler openshift-descheduler-operator/example-descheduler-1 2019/01/04 07:13:21 cputhreshold 10 2019/01/04 07:13:21 memorythreshold 20 2019/01/04 07:13:21 memorytargetthreshold 30 2019/01/04 07:13:21 apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0, apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: cpu: 10 memory: 20 targetThresholds: memory: 30 numberOfNodes: 0 2019/01/04 07:13:21 Creating descheduler job 2019/01/04 07:13:21 Validating descheduler flags 2019/01/04 07:13:21 Creating a new cron job openshift-descheduler-operator/example-descheduler-1 @ravig, I think it's my steps not appropriate. I directly update cm , not via CR, so the cm did not update succ indeed. Really Sorry for the inconvenience. May I change the bug status to "verified"? No problem, please go ahead and modify the status. verified! version info: oc v4.0.0-0.130.0 kubernetes v1.11.0+f67f40dbad features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-10-163.ec2.internal:8443 openshift v4.0.0-0.130.0 kubernetes v1.11.0+f67f40dbad Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |