Bug 1659183

Summary:	Director deployed OCP 3.11: prometheus-k8s-0 pod fails to start due to nonexistent image - Failed to pull image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2": rpc error: code = Unknown desc = Error: image openshift3/prometheus:v3.11.51-2 not found
Product:	Red Hat OpenStack	Reporter:	Marius Cornea <mcornea>
Component:	openstack-tripleo-common	Assignee:	Martin André <m.andre>
Status:	CLOSED ERRATA	QA Contact:	Marius Cornea <mcornea>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	14.0 (Rocky)	CC:	dbecker, lmarsh, ltomasbo, m.andre, mburns, morazi, pgrist, psahoo, slinaber
Target Milestone:	z2	Keywords:	Triaged, ZStream
Target Release:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-tripleo-common-9.4.1-0.20190119050434.261de49.el7ost	Doc Type:	Known Issue
Doc Text:	Director and openshift-ansible have different expectations regarding image tags. For example, when importing the remote container images locally, director converts the generic tag into one that uniquely identifies the image based on the `version` and `release` labels from the image metadata. Openshift-ansible, however, relies on a unique `openshift_image_tag` variable for all the openshift images tags making it impossible to specify tags of images individually. Deployment of OCP via director fails when the floating v3.11 tag in the remote container image registry points to images with non-consistent `release` or `version` labels in their metadata. From the undercloud, import the odd images prior to deploying OpenShift and set the tag to be consistent across all openshift images: skopeo --tls-verify=false copy docker://registry.access.redhat.com/openshift3/prometheus:v3.11.51-1 docker://192.168.24.1:8787/openshift3/prometheus:v3.11.51-2 Deployment of OpenShift from director completes without missing image.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-04-30 17:51:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Marius Cornea 2018-12-13 18:53:29 UTC

Description of problem:

Director deployed OCP 3.11: prometheus-k8s-0 pod fails to start due to nonexistent image - Failed to pull image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2": rpc error: code = Unknown desc = Error: image openshift3/prometheus:v3.11.51-2 not found

Version-Release number of selected component (if applicable):
openstack-tripleo-common-9.4.1-0.20181012010884.el7ost.noarch
openstack-tripleo-heat-templates-9.0.1-0.20181013060904.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud OCP with tag_from_label: '{version}-{release}' in containers-prepare-parameter.yaml
2. Check pods status on one of the master nodes

Actual results:
[root@openshift-master-0 heat-admin]# oc get pods --all-namespaces | grep -v Running
NAMESPACE                           NAME                                           READY     STATUS             RESTARTS   AGE
openshift-monitoring                prometheus-k8s-0                               3/4       ImagePullBackOff   0          1h

Expected results:
All pods are running

Additional info:

[root@openshift-master-0 heat-admin]# oc describe pods prometheus-k8s-0 --namespace openshift-monitoring
Name:               prometheus-k8s-0
Namespace:          openshift-monitoring
Priority:           0
PriorityClassName:  <none>
Node:               openshift-infra-1/172.17.1.15
Start Time:         Thu, 13 Dec 2018 12:10:35 -0500
Labels:             app=prometheus
                    controller-revision-hash=prometheus-k8s-85dbf9b49
                    prometheus=k8s
                    statefulset.kubernetes.io/pod-name=prometheus-k8s-0
Annotations:        openshift.io/scc=restricted
Status:             Pending
IP:                 10.128.2.3
Controlled By:      StatefulSet/prometheus-k8s
Containers:
  prometheus:
    Container ID:  
    Image:         192.168.24.1:8787/openshift3/prometheus:v3.11.51-2
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      --web.console.templates=/etc/prometheus/consoles
      --web.console.libraries=/etc/prometheus/console_libraries
      --config.file=/etc/prometheus/config_out/prometheus.env.yaml
      --storage.tsdb.path=/prometheus
      --storage.tsdb.retention=15d
      --web.enable-lifecycle
      --storage.tsdb.no-lockfile
      --web.external-url=https://prometheus-k8s-openshift-monitoring.apps.openshift.localdomain/
      --web.route-prefix=/
      --web.listen-address=127.0.0.1:9090
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/prometheus/config_out from config-out (ro)
      /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
      /etc/prometheus/secrets/prometheus-k8s-htpasswd from secret-prometheus-k8s-htpasswd (ro)
      /etc/prometheus/secrets/prometheus-k8s-proxy from secret-prometheus-k8s-proxy (ro)
      /etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro)
      /prometheus from prometheus-k8s-db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-dxh9n (ro)
  prometheus-config-reloader:
    Container ID:  docker://9319f9bf097126539c2824504c72c028990e209f60d1322c487488dfa262ad57
    Image:         192.168.24.1:8787/openshift3/ose-prometheus-config-reloader:v3.11.51-2
    Image ID:      docker-pullable://192.168.24.1:8787/openshift3/ose-prometheus-config-reloader@sha256:f84f7ba5ad7e0a580937a1bec773011b2f15dd5508bfc38eda52732ffadb61a1
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/prometheus-config-reloader
    Args:
      --log-format=logfmt
      --reload-url=http://localhost:9090/-/reload
      --config-file=/etc/prometheus/config/prometheus.yaml
      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
    State:          Running
      Started:      Thu, 13 Dec 2018 12:10:47 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     10m
      memory:  50Mi
    Requests:
      cpu:     10m
      memory:  50Mi
    Environment:
      POD_NAME:  prometheus-k8s-0 (v1:metadata.name)
    Mounts:
      /etc/prometheus/config from config (rw)
      /etc/prometheus/config_out from config-out (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-dxh9n (ro)
  prometheus-proxy:
    Container ID:  docker://c54446ae210d5cee517c8f1e817b259c0a7c3b44595b3176c660bcefbb8602d7
    Image:         192.168.24.1:8787/openshift3/oauth-proxy:v3.11.51-2
    Image ID:      docker-pullable://192.168.24.1:8787/openshift3/oauth-proxy@sha256:c7da086516ddb13e986af396882f2ce771ab5892eefd16c514ebd0785b0f0370
    Port:          9091/TCP
    Host Port:     0/TCP
    Args:
      -provider=openshift
      -https-address=:9091
      -http-address=
      -email-domain=*
      -upstream=http://localhost:9090
      -htpasswd-file=/etc/proxy/htpasswd/auth
      -openshift-service-account=prometheus-k8s
      -openshift-sar={"resource": "namespaces", "verb": "get"}
      -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}
      -tls-cert=/etc/tls/private/tls.crt
      -tls-key=/etc/tls/private/tls.key
      -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
      -cookie-secret-file=/etc/proxy/secrets/session_secret
      -openshift-ca=/etc/pki/tls/cert.pem
      -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      -skip-auth-regex=^/metrics
    State:          Running
      Started:      Thu, 13 Dec 2018 12:10:48 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/proxy/htpasswd from secret-prometheus-k8s-htpasswd (rw)
      /etc/proxy/secrets from secret-prometheus-k8s-proxy (rw)
      /etc/tls/private from secret-prometheus-k8s-tls (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-dxh9n (ro)
  rules-configmap-reloader:
    Container ID:  docker://02b2f7dde9557b4db5af6e61d645ba148315048153aabd0767b2a90100697e6f
    Image:         192.168.24.1:8787/openshift3/ose-configmap-reloader:v3.11.51-2
    Image ID:      docker-pullable://192.168.24.1:8787/openshift3/ose-configmap-reloader@sha256:3e2f688074eae0671f71cfb1561307b18c1d638d4657a431872022cc045b0c9d
    Port:          <none>
    Host Port:     <none>
    Args:
      --webhook-url=http://localhost:9090/-/reload
      --volume-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
    State:          Running
      Started:      Thu, 13 Dec 2018 12:10:54 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     5m
      memory:  10Mi
    Requests:
      cpu:        5m
      memory:     10Mi
    Environment:  <none>
    Mounts:
      /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-dxh9n (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s
    Optional:    false
  config-out:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  prometheus-k8s-rulefiles-0:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-k8s-rulefiles-0
    Optional:  false
  secret-prometheus-k8s-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-tls
    Optional:    false
  secret-prometheus-k8s-proxy:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-proxy
    Optional:    false
  secret-prometheus-k8s-htpasswd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-htpasswd
    Optional:    false
  prometheus-k8s-db:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  prometheus-k8s-token-dxh9n:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-token-dxh9n
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/infra=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason     Age                From                        Message
  ----     ------     ----               ----                        -------
  Normal   Scheduled  1h                 default-scheduler           Successfully assigned openshift-monitoring/prometheus-k8s-0 to openshift-infra-1
  Normal   Pulling    1h                 kubelet, openshift-infra-1  pulling image "192.168.24.1:8787/openshift3/ose-prometheus-config-reloader:v3.11.51-2"
  Normal   Pulled     1h                 kubelet, openshift-infra-1  Successfully pulled image "192.168.24.1:8787/openshift3/ose-prometheus-config-reloader:v3.11.51-2"
  Normal   Created    1h                 kubelet, openshift-infra-1  Created container
  Normal   Started    1h                 kubelet, openshift-infra-1  Started container
  Normal   Pulling    1h                 kubelet, openshift-infra-1  pulling image "192.168.24.1:8787/openshift3/oauth-proxy:v3.11.51-2"
  Normal   Created    1h                 kubelet, openshift-infra-1  Created container
  Normal   Pulling    1h                 kubelet, openshift-infra-1  pulling image "192.168.24.1:8787/openshift3/ose-configmap-reloader:v3.11.51-2"
  Normal   Started    1h                 kubelet, openshift-infra-1  Started container
  Normal   Pulled     1h                 kubelet, openshift-infra-1  Successfully pulled image "192.168.24.1:8787/openshift3/oauth-proxy:v3.11.51-2"
  Normal   Pulled     1h                 kubelet, openshift-infra-1  Successfully pulled image "192.168.24.1:8787/openshift3/ose-configmap-reloader:v3.11.51-2"
  Normal   Created    1h                 kubelet, openshift-infra-1  Created container
  Normal   Started    1h                 kubelet, openshift-infra-1  Started container
  Normal   BackOff    1h (x2 over 1h)    kubelet, openshift-infra-1  Back-off pulling image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2"
  Warning  Failed     1h (x3 over 1h)    kubelet, openshift-infra-1  Failed to pull image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2": rpc error: code = Unknown desc = Error: image openshift3/prometheus:v3.11.51-2 not found
  Normal   Pulling    1h (x3 over 1h)    kubelet, openshift-infra-1  pulling image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2"
  Warning  Failed     1h (x3 over 1h)    kubelet, openshift-infra-1  Error: ErrImagePull
  Warning  Failed     4s (x433 over 1h)  kubelet, openshift-infra-1  Error: ImagePullBackOff

Comment 1 Marius Cornea 2018-12-13 18:59:11 UTC

On undercloud we have v3.11.51-1 tag for the prometheus image.

(undercloud) [stack@undercloud-0 ~]$ docker images | grep prometheus
192.168.24.1:8787/openshift3/ose-prometheus-operator             v3.11.51-2          d24ce5e6f296        9 days ago          582 MB
192.168.24.1:8787/openshift3/ose-prometheus-config-reloader      v3.11.51-2          e8c3cd83fd6e        9 days ago          510 MB
192.168.24.1:8787/openshift3/prometheus                          v3.11.51-1          c99a294ec062        9 days ago          283 MB
192.168.24.1:8787/openshift3/prometheus-node-exporter            v3.11.51-2          52f02f543117        9 days ago          225 MB
192.168.24.1:8787/openshift3/prometheus-alertmanager             v3.11.51-2          cf43629ba0d3        9 days ago          236 MB

Comment 2 Martin André 2018-12-14 07:54:05 UTC

There is no way currently in openshift-ansible to use a different tag for the prometheus image, all the image must have the same tag corresponding to the value of openshift_image_tag.

We either need to bump the tag for the prometheus image in the registry [1] from v3.11.51-1 to v3.11.51-2 to match the other openshift images, or as a workaround, stop setting the "tag_from_label" in ContainerImagePrepare. However, I'd be cautious with the workaround as this is going to mess up with the updates for the haproxy and keepalived services managed by tripleo.

A better workaround would be to re-tag the prometheus in the local registry:

$ docker tag 192.168.24.1:8787/openshift3/prometheus:v3.11.51-1 192.168.24.1:8787/openshift3/prometheus:v3.11.51-2
$ docker push 192.168.24.1:8787/openshift3/prometheus:v3.11.51-2

[1] https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/prometheus

Comment 3 Marius Cornea 2018-12-15 01:15:43 UTC

Adding a workaround that worked for me before triggering the overcloud deploy:

docker pull registry.access.redhat.com/openshift3/prometheus:v3.11.51-1
skopeo --tls-verify=false copy docker://registry.access.redhat.com/openshift3/prometheus:v3.11.51-1 docker://192.168.24.1:8787/openshift3/prometheus:v3.11.51-2

Comment 9 Marius Cornea 2019-01-03 22:50:29 UTC

(In reply to Marius Cornea from comment #3)
> Adding a workaround that worked for me before triggering the overcloud
> deploy:
> 
> docker pull registry.access.redhat.com/openshift3/prometheus:v3.11.51-1
> skopeo --tls-verify=false copy
> docker://registry.access.redhat.com/openshift3/prometheus:v3.11.51-1
> docker://192.168.24.1:8787/openshift3/prometheus:v3.11.51-2

Small correction, only the skopeo command is needed as workaround:

skopeo --tls-verify=false copy docker://registry.access.redhat.com/openshift3/prometheus:v3.11.51-1 docker://192.168.24.1:8787/openshift3/prometheus:v3.11.51-2

Comment 12 Luis Tomas Bolivar 2019-03-13 10:23:06 UTC

*** Bug 1680523 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2019-04-30 17:51:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0878