Bug 1677232

Summary: oauth-proxy and grafana uses origin images may bring CreateContainerError on OCP, should use OCP images
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: mloibl, surbania
Target Milestone: ---Keywords: Regression
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2019-02-14 10:24:18 UTC
Description of problem:
Cloned from https://jira.coreos.com/browse/MON-553

$ oc get pod -n openshift-monitoring | grep -v Running
NAME                                           READY   STATUS                 RESTARTS   AGE
alertmanager-main-0                            2/3     CreateContainerError   0          81m
grafana-78765ddcc7-7fm4w                       0/2     CreateContainerError   0          81m
prometheus-k8s-0                               5/6     CreateContainerError   1          81m
prometheus-k8s-1                               5/6     CreateContainerError   1          81m

$ oc -n openshift-monitoring describe pod grafana-78765ddcc7-7fm4w
Events:
  Type     Reason     Age                From                                               Message
  ----     ------     ----               ----                                               -------
  Normal   Scheduled  105s               default-scheduler                                  Successfully assigned openshift-monitoring/grafana-78765ddcc7-7fm4w to ip-10-0-155-5.us-east-2.compute.internal
  Normal   Pulling    96s                kubelet, ip-10-0-155-5.us-east-2.compute.internal  pulling image "quay.io/openshift/origin-grafana:latest"
  Normal   Pulled     71s                kubelet, ip-10-0-155-5.us-east-2.compute.internal  Successfully pulled image "quay.io/openshift/origin-grafana:latest"
  Normal   Created    71s                kubelet, ip-10-0-155-5.us-east-2.compute.internal  Created container
  Normal   Started    71s                kubelet, ip-10-0-155-5.us-east-2.compute.internal  Started container
  Normal   Pulled     15s (x6 over 70s)  kubelet, ip-10-0-155-5.us-east-2.compute.internal  Successfully pulled image "quay.io/openshift/origin-oauth-proxy:latest"
  Warning  Failed     15s (x6 over 70s)  kubelet, ip-10-0-155-5.us-east-2.compute.internal  Error: Manifest does not match provided manifest digest sha256:1048478235200fd07addb714dbd729fa67535ac615b120e33f21a5a1cab94890
  Normal   Pulling    1s (x7 over 71s)   kubelet, ip-10-0-155-5.us-east-2.compute.internal  pulling image "quay.io/openshift/origin-oauth-proxy:latest"


# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-13-204401   True        False         3h7m      Cluster version is 4.0.0-0.nightly-2019-02-13-204401

extract payload, find grafana and oauth-proxy use origin images
#cat 0000_70_cluster-monitoring-operator_04-deployment.yaml
      containers:
      - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e134998078f81af760461e0ee4792d63ceb35f09ecb6a9d6227c9d6ab99a4e1a
        name: cluster-monitoring-operator
        args:
        - "-namespace=openshift-monitoring"
        - "-configmap=cluster-monitoring-config"
        - "-logtostderr=true"
        - "-v=4"
        - "-images=prometheus-operator=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:467fc27e6c80a0dd5ff4cb4d904c14e8f70fe23e5697fd804f5c99eb0e68e1c6"
        - "-images=prometheus-config-reloader=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:254d66deb1210d77e43555ac30caf54486549c1808b1a5644299c37d8a215697"
        - "-images=configmap-reloader=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:186f88f94e7c5c9404319db8d9e1f44da6b41ab58b8e8d624cdd84d222da7ba7"
        - "-images=prometheus=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:779dee1092415be6594e90337cd5c44272fc3109caa6db4afaca0f86fcf3478b"
        - "-images=alertmanager=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:54cd1a925aaa43267eddc3132b1e8eb17be12e3918351736ea45f61d64b93721"
        - "-images=grafana=quay.io/openshift/origin-grafana:latest"
        - "-images=oauth-proxy=quay.io/openshift/origin-oauth-proxy:latest"
        - "-images=node-exporter=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cabb123af36d1d9f6fffacea1dcbc6980a591804e544c0b584a3b2be9d808660"
        - "-images=kube-state-metrics=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e8d485110f5af3b0f3fd9e73c921170d1662ac7c0fa8052775f69bdedde795df"
        - "-images=kube-rbac-proxy=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6575e307a6106e5c589128ac118d473d4af53d89bb89fe2c4ba9748b0fd986c2"
        - "-images=telemeter-client=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8c01269ff766c2722f02babbfa46760bc7f48d8b652121019eaf38018293b6fb"
        - "-images=prom-label-proxy=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c0bf4c44490a97b898332fbb0333b58b91068985ff50672502f33046e99e9ead"
        - "-images=k8s-prometheus-adapter=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:47bb1178fa706ebc8dc42c00ce821140170cefa2f367007f3169a23df1c8f767"
********************************************************************************
digest is 1048478235200fd07addb714dbd729fa67535ac615b120e33f21a5a1cab94890 for quay.io/openshift/origin-oauth-proxy:latest, as we see from `oc -n openshift-monitoring describe pod grafana-78765ddcc7-7fm4w", it is the same
#docker pull quay.io/openshift/origin-oauth-proxy:latest
Trying to pull repository quay.io/openshift/origin-oauth-proxy ... 
latest: Pulling from quay.io/openshift/origin-oauth-proxy
a02a4930cb5d: Pull complete 
6d1570ccea24: Pull complete 
51662e510b97: Pull complete 
Digest: sha256:1048478235200fd07addb714dbd729fa67535ac615b120e33f21a5a1cab94890
Status: Downloaded newer image for quay.io/openshift/origin-oauth-proxy:latest

As we see from below, oauth-proxy Digest is not the same as origin-oauth-proxy:latest, this caused error
#oc adm release info --pullspecs "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-13-204401" | grep oauth-proxy
  oauth-proxy                                   quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ddf6342e42f3ef5202dfcb32f7adaadf937d96d8c4f0fd8a5657954c5f19965d

oauth-proxy and grafana should use OCP images as other images

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Delete one node which prometheus-k8s/alertmanager running at, and let monitoring pods reschedule
2.
3.

Actual results:
Pod in CreateContainerError status

Expected results:
oauth-proxy and grafana should use OCP images as other images

Additional info:

Comment 1 Frederic Branczyk 2019-02-22 16:58:37 UTC
As of https://github.com/openshift/cluster-monitoring-operator/pull/259 this should be fixed.

Comment 3 Junqi Zhao 2019-02-26 01:27:53 UTC
Checked 0000_70_cluster-monitoring-operator_04-deployment.yaml in payload, oauth-proxy and grafana use OCP images now
        - "-images=prometheus-operator=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:95a1b9562f4377026a659feb9d025af659761e48be1a68b286c761883caf9f2b"
        - "-images=prometheus-config-reloader=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:551fc7a7ea1bddab0646b9dac22e04b6597489d0788ae6053f9f6dfc715f0b69"
        - "-images=configmap-reloader=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2c0d8efc722097f4476c9d62d19d54a9061fc7168a432da523c3cda12d144680"
        - "-images=prometheus=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd698743e1478cf580e2e75749b2c10f7824ffddcee0469dccfaa2f58df2f008"
        - "-images=alertmanager=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7147815b797ba173b494c4d8f468f5bad5231a3f54bdb02960b5cd8f78514d43"
        - "-images=grafana=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:51166898d56ed1beacb24bffb4224bd58eedf1d4109fcf74491c2730844724ad"
        - "-images=oauth-proxy=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c0dbc2a74a1573bb7a9032ba3ba31fb14c9d575a5483e3cd86baa602568ffc25"
        - "-images=node-exporter=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d3e99a8bc1759788e11dbe9588b6b2cb4976cb2d92f3dc9c6bfadafd549a98c"
        - "-images=kube-state-metrics=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:39c22c66cce390d747d116752c1dcc85ff5a9e70587a0649d43b46f017e62948"
        - "-images=kube-rbac-proxy=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a85cf9c98cba46dd4a8cd90c45232fc92c6b3fe3d8d5c3c2efc0084ea3f048f"
        - "-images=telemeter-client=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cdee0fbdfb7f64ee81a543bdad68e77fcaa85f9c4e296ab6cfcb18fe05f12632"
        - "-images=prom-label-proxy=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bceec206dc589dd7d4b499b271027d0f5a06687901db3da3076faeab355b1ee9"
        - "-images=k8s-prometheus-adapter=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f4ae9936c3d4c7ae1854497275e7596109bc60362811cb4771e3392df6b680c5"

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-25-194625   True        False         18m     Cluster version is 4.0.0-0.nightly-2019-02-25-194625

RHCOS build: 47.330

Comment 6 errata-xmlrpc 2019-06-04 10:44:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758