Bug 1666649

Summary: [descheduler-operator] The descheduler job pod not use down stream images
Product: OpenShift Container Platform Reporter: MinLi <minmli>
Component: NodeAssignee: ravig <rgudimet>
Status: CLOSED ERRATA QA Contact: MinLi <minmli>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, jhou, jodavis, jokerman, minmli, mmccomas, rgudimet, sjenning, sponnaga, tbielawa, wsun
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:41:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description MinLi 2019-01-16 09:50:13 UTC
Description of problem:
when use descheduler operator to create a descheduler, it not use down stream image(quay.io/openshift-release-dev/ocp-v4.0-art-dev:XXX)

Version-Release number of selected component (if applicable):
[core@ip-10-0-26-190 ~]$ oc get clusterversion                                            
NAME      VERSION     AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.1   True        False         6h        Cluster version is 4.0.0-0.1

[core@ip-10-0-26-190 ~]$ oc version 
oc v4.0.0-0.125.0
kubernetes v1.11.0+406fc897d8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-jialiu3-api.qe.devcluster.openshift.com:6443
kubernetes v1.11.0+c69f926354


How reproducible:
always 

Steps to Reproduce:
1.create a descheduler-operator and descheduler by deploy (refer to: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-21205)
Pay attention to: modify operator.yaml, replace the content of field "image" with "quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.0.0-0.139.0.0-ose-descheduler-operator"

2.check descheduler job pod's image
[core@ip-10-0-26-190 ~]$ oc get pod example-descheduler-1-1547630580-zdxs6 -o yaml | grep -i image
    image: registry.svc.ci.openshift.org/openshift/origin-v4.0:descheduler
    imageID: registry.svc.ci.openshift.org/openshift/origin-v4.0@sha256:27e026b56615259eda5903f398dc3f2472a926242cc2c7bc84923066ed6da545


Actual results:


Expected results:
in step2, the Pods should use downstream image built from RHEL:
quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.0.0-0.139.0.0-ose-descheduler

Additional info:

Comment 1 ravig 2019-01-23 19:39:47 UTC
Hi MinLi,

In the first step, you're modifying the container image for descheduler-operator where the job is created for descheduler, so there will be a mismatch. Having said that, descheduler operator needs to provide a way for descheduler which it is managing. I have created a PR for it, https://github.com/openshift/descheduler-operator/pull/41/files

With this, we need to update the CR, with the specified image instead of updating deployment, if you update deployment, it will just change the descheduler-operator's image location.

This is how CR would look like:

apiVersion: descheduler.io/v1alpha1
kind: Descheduler
metadata:
  name: descheduler-cr
spec:
  schedule: "*/1 * * * ?"
  image: "quay.io/openshift/origin-descheduler:latest" #Please note that this field needs to be updated for descheduler, not descheduler-operator
  strategies:
    - name: "lownodeutilization"
      params:
       - name: "cputhreshold"
         value: "10"
       - name: "memorythreshold"
         value: "20"
       - name: "memorytargetthreshold"
         value: "30"


Once the PR gets lgtm'ed and merged, we shouldn't see any problem with it.

Comment 2 MinLi 2019-01-24 02:38:14 UTC
@ravig, I understand what you mean, thx~

The reason why I modify the descheduler-operator image in step 1 is that: the descheduler-operator image should also use down stream image according to "Beta Images for OCP 4.0" (https://docs.google.com/spreadsheets/d/1n6lEgEPjs7rtSK9WEGXb1Z4oVse7MM2Q8jrGpsR4AFY/edit#gid=1869224058). FYI~

Comment 3 ravig 2019-01-24 22:14:33 UTC
MinLi,

The PR got merged. Can you please verify the fix suggested above for updating images of both descheduler-operator and descheduler?

Comment 5 Jianwei Hou 2019-01-29 07:37:03 UTC
@rgudimet Is there a guideline or readme which image is proper to use? Should the image name be ose-descheduler? For images ready to be released to customers, they are built from RHEL. Most origin-* images are built from CentOS.

Comment 6 MinLi 2019-01-29 07:49:45 UTC
@ravig , after update descheduler image as you mentioned, the problem still reproduced.

env info:
[core@ip-10-0-15-55 ~]$ oc version 
oc v4.0.0-0.147.0
kubernetes v1.11.0+dde478551e
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://qe-jialiu2-api.qe.devcluster.openshift.com:6443
kubernetes v1.11.0+8868a98a7b

[core@ip-10-0-15-55 ~]$ oc get clusterversion 
NAME      VERSION     AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.2   True        False         1d        Cluster version is 4.0.0-0.2

Comment 7 ravig 2019-01-30 14:20:04 UTC
@MinLi, I am not sure why are you looking at clusterversion? Any specific reason for this? Shouldn't we check for descheduler deployment?

> Is there a guideline or readme which image is proper to use? Should the image name be ose-descheduler?

I can update the readme in repo but I am not sure about the image. It is configurable as such, we can choose whatever the image we want it to be(as long as we have access to registry).

Comment 8 Seth Jennings 2019-01-30 19:48:16 UTC
MinLi,

Looking at the test in Polarion, you need to specify spec.image in the `Kind: Descheduler` you create to override the default descheduler image of `quay.io/openshift/origin-descheduler:latest`

Comment 9 MinLi 2019-01-31 08:29:54 UTC
(In reply to ravig from comment #7)
> @MinLi, I am not sure why are you looking at clusterversion? Any specific
> reason for this? Shouldn't we check for descheduler deployment?

> @ravig, QE need attach env info when verify any bug, it's a must requirements.
> I specify spec.image to "quay.io/openshift/origin-descheduler:latest" in "kind: Descheduler" CR file, but when check descheduler job pod's image, it is as the same as before:

    $ oc get pod example-descheduler-1-XXX -o yaml | grep -i image
    image: registry.svc.ci.openshift.org/openshift/origin-v4.0:descheduler
    imageID: registry.svc.ci.openshift.org/openshift/origin-v4.0@sha256:27e026b56615259eda5903f398dc3f2472a926242cc2c7bc84923066ed6da545



> > Is there a guideline or readme which image is proper to use? Should the image name be ose-descheduler?
> 
> I can update the readme in repo but I am not sure about the image. It is
> configurable as such, we can choose whatever the image we want it to be(as
> long as we have access to registry).

>@ravig, By my understanding, user can configure image of descheduler and descheduler-operator according to release image list? If so, you need update the image of readme in repo any time we release a OCP version? Is this what you mean?

Comment 10 MinLi 2019-01-31 08:33:16 UTC
(In reply to Seth Jennings from comment #8)
> MinLi,
> 
> Looking at the test in Polarion, you need to specify spec.image in the
> `Kind: Descheduler` you create to override the default descheduler image of
> `quay.io/openshift/origin-descheduler:latest`

 Seth Jennings, I do specify spec.image to "quay.io/openshift/origin-descheduler:latest" in "kind: Descheduler" CR file, but descheduler job pod's image is as the same as before, pls refer to  Comment 9.

Comment 12 MinLi 2019-01-31 09:13:07 UTC
(In reply to MinLi from comment #10)
> (In reply to Seth Jennings from comment #8)
> > MinLi,
> > 
> > Looking at the test in Polarion, you need to specify spec.image in the
> > `Kind: Descheduler` you create to override the default descheduler image of
> > `quay.io/openshift/origin-descheduler:latest`
> 
>  Seth Jennings, I do specify spec.image to
> "quay.io/openshift/origin-descheduler:latest" in "kind: Descheduler" CR
> file, but descheduler job pod's image is as the same as before, pls refer to
> Comment 9.

Sorry, Pls see the "UP-TO-DATE reply" in  Comment 11.

Comment 18 MinLi 2019-02-14 07:37:24 UTC
verified! 
image:
ose-descheduler-operator: quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.0.0-0.171.0.0-ose-descheduler-operator
ose-descheduler: quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.0.0-0.171.0.0-ose-descheduler

version info:
[core@ip-10-0-19-185 ~]$ oc version 
oc v4.0.0-0.170.0
kubernetes v1.12.4+45dbe929fa
features: Basic-Auth GSSAPI Kerberos SPNEGO

[root@localhost lyman]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-13-204401   True        False         4h54m     Error while reconciling 4.0.0-0.nightly-2019-02-13-204401: the cluster operator monitoring is failing

Comment 21 errata-xmlrpc 2019-06-04 10:41:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758