Description of problem: when removeDuplicates strategy is enabled, descheduler evicts the pod which are duplicates and then the pod gets recreated and descheduler thinks still there is a replica and it evicts the pod again and this continues in a loop. Due to this when viewing metrics of evicted pods with RemoveDuplicate strategy the no.of pods which are evicted keeps growing continuously even though the cluster does not have so many pods which is confusing. Version-Release number of selected components (if applicable): Clusterkubedescheduleroperator.4.8.0-202104142158.p0 How reproducible: Always Steps to reproduce: Install 4.8 descheduler Enable TopologyAndDuplicates profile Create pods using the command oc create dc hello --image=<image_name> Edit the dc and change the replicas to 12 Change the descheduling interval to 60 seconds Wait for one descheduler run to happen, browse through the prometheus UI and select descheduler_pods_evicted Actual Results: You would see the graph of descheduler_pods_evicted keeps on increasing forever for removeDuplicates strategy Expected Results: Need to determine the right way to show the evicted pod count for removeDuplicates strategy
Will not be fixed in 4.8 timeframe
Upstream PR under review
currently blocked by the bug[1] for verification, once the bug moves to ON_QA i will verify this bug as well. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1970828
Moving the bug to verified state as i see that removeDuplicate strategy works fine along with metrics shown in prometheus UI. [knarra@knarra ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-14-145150 True False 132m Cluster version is 4.8.0-0.nightly-2021-06-14-145150 steps followed to verify the same: ================================= 1) Install 4.8 cluster 2) Enable TopologyAndDuplicates profile 3) Create pods using the command oc create dc hello --image=<image_name> 4) Edit the dc and change the replicas to 12 5) Change the descheduling interval to 60 seconds 6) Wait for one descheduler run to happen, browse through the prometheus UI and select descheduler_pods_evicted From the metrics you could see the right no.of pods that were evicted. Tested the same with deployment & replicaset and it all worked fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438