Description of problem: Descheduler should not evict pod used local storage by the PVC Version-Release number of selected component (if applicable): [root@dhcp-140-138 ~]# oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.7.0-202012260223.p0 Kube Descheduler Operator 4.7.0-202012260223.p0 Succeeded How reproducible: always Steps to Reproduce: 1) Run the command below in one of the worker node # sudo mkdir /mnt/data 2) create a index.html and add the content below Hello from Kubernetes storage 3) Create a hostpath persistent volume using the following. apiVersion: v1 kind: PersistentVolume metadata: name: task-pv-volume labels: type: local spec: storageClassName: manual capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: "/mnt/data 4) Run the command to test if the pv is created # oc get pv NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE task-pv-volume 10Gi RWO Retain Available manual 4s 5)Create a persistent volume using the following apiVersion: v1 kind: PersistentVolumeClaim metadata: name: task-pv-claim spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 3Gi 6) Run the command to test if the pvc is created # oc get pv task-pv-volume NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE task-pv-volume 10Gi RWO Retain Bound default/task-pv-claim manual 2m 7) Now create a rc with this pvc as below [ramakasturinarra@dhcp35-60 ocp_files]$ cat rc.yaml apiVersion: v1 kind: ReplicationController metadata: name: rcex spec: replicas: 3 selector: app: sise template: metadata: name: somename labels: app: sise spec: containers: - name: sise image: quay.io/openshifttest/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e ports: - containerPort: 9876 volumeMounts: - mountPath: /tmp name: task-pv-storage volumes: - name: task-pv-storage persistentVolumeClaim: claimName: task-pv-claim Actual results: [zhouying@dhcp-140-138 ~]$ oc get po -o wide rcex-22pxc 1/1 Running 0 20m 10.128.2.173 yinzhou-1230-2gh6m-compute-1 <none> <none> rcex-cgbw2 1/1 Running 0 48s 10.128.2.212 yinzhou-1230-2gh6m-compute-1 <none> <none> rcex-tcz28 1/1 Running 0 48s 10.128.2.213 yinzhou-1230-2gh6m-compute-1 <none> <none> Check the descheduler logs , will always evicted the pods belongs to the RC which used the localstorage: I1231 06:11:04.120738 1 evictions.go:117] "Evicted pod" pod="zhouy/rcex-g9t7c" reason=" (RemoveDuplicatePods)" I1231 06:11:04.121133 1 event.go:291] "Event occurred" object="zhouy/rcex-g9t7c" kind="Pod" apiVersion="v1" type="Normal" reason="Descheduled" message="pod evicted by sigs.k8s.io/descheduler (RemoveDuplicatePods)" Expected results: The PVC use the PV with hostPath, is also a type of local storage , descheduler should not evict this type of pod. Additional info:
Upstream issue for this: https://github.com/kubernetes-sigs/descheduler/issues/96
Upstream PR: https://github.com/kubernetes-sigs/descheduler/pull/481
Opened https://github.com/openshift/descheduler/pull/53 to add this to our descheduler fork. However, for consistency this is opt-in (with the default behavior still being to evict PVC pods). So we will need to update our descheduler operator to either provide this option or enable it by default. Jan/Maciej, what do you think? I would be fine just exposing it as a bool in the CRD since it is a component-level setting that would affect all profiles. Taking the opinionated approach of enabling it by default is a break from the current behavior.
The less the better. Let's make it true by default. It's easier to change it to false or make it configurable later rather than have it on from the start.
Ah right, I was thinking we had already GA'd the descheduler with the current behavior, but 4.7 is not released yet. In that case we can make it true by default.
still see the issue happening on the csv below, will try again on monday [knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.7.0-202101281146.p0 Kube Descheduler Operator 4.7.0-202101281146.p0 Succeeded I0129 12:38:46.859836 1 duplicates.go:189] "Average occurrence per node" node="ip-10-0-152-104.us-east-2.compute.internal" ownerKey={namespace:knarra kind:ReplicationController name:rcex imagesHash:quay.io/openshifttest/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e} avg=1 I0129 12:38:46.872496 1 evictions.go:117] "Evicted pod" pod="knarra/rcex-p4c75" reason=" (RemoveDuplicatePods)" I0129 12:38:46.882118 1 evictions.go:117] "Evicted pod" pod="knarra/rcex-vljvz" reason=" (RemoveDuplicatePods)"
Could you share the output for `oc get -o yaml cm/cluster` too? That will tell us if the default setting is being properly set in the operator
The shared configmap data shows it's being set: ``` [knarra@knarra ~]$ oc get -o yaml cm/cluster -n openshift-kube-descheduler-operator apiVersion: v1 data: policy.yaml: | apiVersion: descheduler/v1alpha1 ignorePvcPods: true kind: DeschedulerPolicy ... ``` The reason this isn't working yet is https://github.com/openshift/descheduler/pull/53 never got linked to this PR (which pulls the actual descheduler change into our fork). That's been updated, and once it merges and a new descheduler build happens we should attempt to re-verify
Verified bug with the payload below and did not see the issue happening. [knarra@knarra ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-02-01-232332 True False 146m Cluster version is 4.7.0-0.nightly-2021-02-01-232332 [knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.7.0-202101300133.p0 Kube Descheduler Operator 4.7.0-202101300133.p0 Succeeded [knarra@knarra ~]$ oc get -o yaml cm/cluster -n openshift-kube-descheduler-operator apiVersion: v1 data: policy.yaml: | apiVersion: descheduler/v1alpha1 ignorePvcPods: true kind: DeschedulerPolicy strategies: RemoveDuplicates: enabled: true params: includeSoftConstraints: false namespaces: exclude: [knarra@knarra ~]$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rcex-5c7px 1/1 Running 0 77m 10.131.0.58 ip-10-0-160-222.us-east-2.compute.internal <none> <none> rcex-lznkn 1/1 Running 0 77m 10.129.2.55 ip-10-0-205-212.us-east-2.compute.internal <none> <none> rcex-wqvpm 1/1 Running 0 77m 10.128.2.40 ip-10-0-146-72.us-east-2.compute.internal <none> <none> [knarra@knarra ~]$ oc logs cluster-69866c5699-8jsxt -n openshift-kube-descheduler-operator | grep "Evicted" [knarra@knarra ~]$ Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633