Hide Forgot
Description of problem: descheduler job should print log of evicting pod, now there is no log when evicting pod. Version-Release number of selected component (if applicable): oc v4.0.0-0.95.0 kubernetes v1.11.0+8afe8f3cf9 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-7-155.ec2.internal:8443 openshift v4.0.0-0.95.0 kubernetes v1.11.0+8afe8f3cf9 How reproducible: always Steps to Reproduce: 1.download code from github : https://github.com/openshift/descheduler-operator 2.deploy descheduler-operator, step as follows: #oc create -f deploy/namespace.yaml #oc project openshift-descheduler-operator #oc create -f deploy/crds/descheduler_v1alpha1_descheduler_crd.yaml #oc create -f deploy/service_account.yaml #oc create -f deploy/rbac.yaml #oc create -f deploy/operator.yaml #oc create -f deploy/crds/descheduler_v1alpha1_descheduler_cr.yaml and descheduler_v1alpha1_descheduler_cr.yaml is like: apiVersion: descheduler.io/v1alpha1 kind: Descheduler metadata: name: example-descheduler-1 spec: schedule: "*/1 * * * ?" strategies: - name: "duplicates" 3.create a deploymentconfig and replicontroller oc run hello --image=openshift/hello-openshift:latest --replicas=6 4.check duplicate pod if been evicted #oc get pod NAME READY STATUS RESTARTS AGE descheduler-operator-965cb8f7f-kdkk8 1/1 Running 0 27m example-descheduler-1-1544617080-88jb2 0/1 Completed 0 2m example-descheduler-1-1544617140-r4cbb 0/1 Completed 0 1m example-descheduler-1-1544617200-zj9th 0/1 Completed 0 12s hello-1-2xsc5 1/1 Running 0 8m hello-1-6jptk 0/1 ContainerCreating 0 7s hello-1-f2qm4 0/1 Terminating 0 1m hello-1-gv2cv 0/1 ContainerCreating 0 7s hello-1-jvj67 0/1 ContainerCreating 0 7s hello-1-jwxrp 0/1 ContainerCreating 0 7s hello-1-qrw5j 0/1 Terminating 0 1m hello-1-rdphn 0/1 ContainerCreating 0 7s hello-1-rlhn2 0/1 Terminating 0 1m 5.check descheduler job log of evicting duplicate pods. Actual results: 5.There is no log of evicting pod. (But from result of step 4, we can confirm descheduler evicted duplicate pods well) # oc logs example-descheduler-1-1544617200-zj9th Expected results: 5.print log of evicting pods just like 3.11 version: I0110 05:50:42.087277 1 reflector.go:202] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83 I0110 05:50:42.087391 1 reflector.go:240] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83 I0110 05:50:42.187377 1 duplicates.go:49] Processing node: "qe-wjiang-node-registry-router-1" I0110 05:50:42.256960 1 duplicates.go:53] "ReplicationController/hello-1" I0110 05:50:42.264316 1 duplicates.go:61] Evicted pod: "hello-1-d6dvf" (<nil>) I0110 05:50:42.274861 1 duplicates.go:61] Evicted pod: "hello-1-f997x" (<nil>) I0110 05:50:42.284345 1 duplicates.go:61] Evicted pod: "hello-1-fdtj5" (<nil>) I0110 05:50:42.297100 1 duplicates.go:61] Evicted pod: "hello-1-lq752" (<nil>) I0110 05:50:42.336787 1 duplicates.go:61] Evicted pod: "hello-1-rmdk5" (<nil>) Additional info:
I have couple of questions here - In 3.11, can you provide descheduler-cronjob description? - Our docs explicitly tell to run descheduler as job https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/descheduler.html (Unrelated question) - When ran for 3.11, was the log-level increased?(You can find that from description again)
(In reply to ravig from comment #1) > I have couple of questions here > > - In 3.11, can you provide descheduler-cronjob description? - Our docs > explicitly tell to run descheduler as job > https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/ > descheduler.html (Unrelated question) Yeah, in 3.11 the cronjob generate job to run descheduler at fixed period. We create cronjob by pass parameters to ansible job as following openshift_ansible_vars: openshift_descheduler_install: true openshift_descheduler_strategies_dict: "{'remove_duplicates': True, 'remove_pods_violating_inter_pod_anti_affinity': True, 'low_node_utilization': True}" openshift_descheduler_cronjob_node_selector: "" openshift_descheduler_cronjob_schedule: "*/2 * * * *" descheduler-cronjob description: [root@qe-minmli-gcemaster-etcd-1 ~]# oc get cronjob descheduler-cronjob -o yaml apiVersion: batch/v1beta1 kind: CronJob metadata: creationTimestamp: 2018-12-18T02:18:33Z name: descheduler-cronjob namespace: openshift-descheduler resourceVersion: "46315" selfLink: /apis/batch/v1beta1/namespaces/openshift-descheduler/cronjobs/descheduler-cronjob uid: 380feecd-026b-11e9-84f0-42010af00003 spec: concurrencyPolicy: Allow failedJobsHistoryLimit: 1 jobTemplate: metadata: creationTimestamp: null spec: template: metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" creationTimestamp: null name: descheduler-cronjob spec: containers: - args: - --policy-config-file=/policy-dir/policy.yaml - --v=5 - --dry-run command: - /bin/descheduler image: registry.reg-aws.openshift.com:443/openshift3/ose-descheduler:v3.11 imagePullPolicy: IfNotPresent name: descheduler resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /policy-dir name: policy-volume dnsPolicy: ClusterFirst priorityClassName: system-cluster-critical restartPolicy: Never schedulerName: default-scheduler securityContext: {} serviceAccount: descheduler-sa serviceAccountName: descheduler-sa terminationGracePeriodSeconds: 30 volumes: - configMap: defaultMode: 420 name: descheduler-policy-configmap name: policy-volume schedule: '*/2 * * * *' successfulJobsHistoryLimit: 3 suspend: false status: lastScheduleTime: 2018-12-18T07:02:00Z configmap description: [root@qe-minmli-gcemaster-etcd-1 ~]# oc get cm descheduler-policy-configmap -o yaml apiVersion: v1 data: policy.yaml: | apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "RemoveDuplicates": enabled: true "RemovePodsViolatingInterPodAntiAffinity": enabled: true "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: "cpu" : 40 "memory": 40 "pods": 40 targetThresholds: "cpu" : 70 "memory": 70 "pods": 70 kind: ConfigMap metadata: creationTimestamp: 2018-12-18T02:18:30Z name: descheduler-policy-configmap namespace: openshift-descheduler resourceVersion: "4390" selfLink: /api/v1/namespaces/openshift-descheduler/configmaps/descheduler-policy-configmap uid: 36175f3f-026b-11e9-84f0-42010af00003 > - When ran for 3.11, was the log-level increased?(You can find that from > description again) NO, DEBUG_LOGLEVEL=5 (/etc/origin/master/master.env ) Version-Release number of selected component: oc v3.11.57 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-minmli-gcemaster-etcd-1:8443 openshift v3.11.57 kubernetes v1.11.0+d4cacc0
Thanks for the output MinLi. Actually, the following section of job config is causing the increased logging: - args: - --policy-config-file=/policy-dir/policy.yaml - --v=5 While I don't mind increasing the logging level, I have an inflight PR https://github.com/openshift/descheduler-operator/pull/36, through which you can pass --v(or log-level) option for descheduler binary from the CR. So, either way, we need to update docs to show that we are creating a cron job ibstead of a job. Once the PR merges, we should be able to add a flag which shows the log values.
https://github.com/openshift/descheduler-operator/pull/38 This PR should have fixed the issue.
verified! Version info: oc v4.0.0-alpha.0+85a0623-808 kubernetes v1.11.0+85a0623 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://wsun-qe-api.origin-ci-int-aws.dev.rhcloud.com:6443 kubernetes v1.11.0+85a0623
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758