Description of problem: Director deployed OCP 3.11: prometheus-k8s-0 pod fails to start due to nonexistent image - Failed to pull image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2": rpc error: code = Unknown desc = Error: image openshift3/prometheus:v3.11.51-2 not found Version-Release number of selected component (if applicable): openstack-tripleo-common-9.4.1-0.20181012010884.el7ost.noarch openstack-tripleo-heat-templates-9.0.1-0.20181013060904.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud OCP with tag_from_label: '{version}-{release}' in containers-prepare-parameter.yaml 2. Check pods status on one of the master nodes Actual results: [root@openshift-master-0 heat-admin]# oc get pods --all-namespaces | grep -v Running NAMESPACE NAME READY STATUS RESTARTS AGE openshift-monitoring prometheus-k8s-0 3/4 ImagePullBackOff 0 1h Expected results: All pods are running Additional info: [root@openshift-master-0 heat-admin]# oc describe pods prometheus-k8s-0 --namespace openshift-monitoring Name: prometheus-k8s-0 Namespace: openshift-monitoring Priority: 0 PriorityClassName: <none> Node: openshift-infra-1/172.17.1.15 Start Time: Thu, 13 Dec 2018 12:10:35 -0500 Labels: app=prometheus controller-revision-hash=prometheus-k8s-85dbf9b49 prometheus=k8s statefulset.kubernetes.io/pod-name=prometheus-k8s-0 Annotations: openshift.io/scc=restricted Status: Pending IP: 10.128.2.3 Controlled By: StatefulSet/prometheus-k8s Containers: prometheus: Container ID: Image: 192.168.24.1:8787/openshift3/prometheus:v3.11.51-2 Image ID: Port: <none> Host Port: <none> Args: --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --storage.tsdb.retention=15d --web.enable-lifecycle --storage.tsdb.no-lockfile --web.external-url=https://prometheus-k8s-openshift-monitoring.apps.openshift.localdomain/ --web.route-prefix=/ --web.listen-address=127.0.0.1:9090 State: Waiting Reason: ImagePullBackOff Ready: False Restart Count: 0 Environment: <none> Mounts: /etc/prometheus/config_out from config-out (ro) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /etc/prometheus/secrets/prometheus-k8s-htpasswd from secret-prometheus-k8s-htpasswd (ro) /etc/prometheus/secrets/prometheus-k8s-proxy from secret-prometheus-k8s-proxy (ro) /etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro) /prometheus from prometheus-k8s-db (rw) /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-dxh9n (ro) prometheus-config-reloader: Container ID: docker://9319f9bf097126539c2824504c72c028990e209f60d1322c487488dfa262ad57 Image: 192.168.24.1:8787/openshift3/ose-prometheus-config-reloader:v3.11.51-2 Image ID: docker-pullable://192.168.24.1:8787/openshift3/ose-prometheus-config-reloader@sha256:f84f7ba5ad7e0a580937a1bec773011b2f15dd5508bfc38eda52732ffadb61a1 Port: <none> Host Port: <none> Command: /bin/prometheus-config-reloader Args: --log-format=logfmt --reload-url=http://localhost:9090/-/reload --config-file=/etc/prometheus/config/prometheus.yaml --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml State: Running Started: Thu, 13 Dec 2018 12:10:47 -0500 Ready: True Restart Count: 0 Limits: cpu: 10m memory: 50Mi Requests: cpu: 10m memory: 50Mi Environment: POD_NAME: prometheus-k8s-0 (v1:metadata.name) Mounts: /etc/prometheus/config from config (rw) /etc/prometheus/config_out from config-out (rw) /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-dxh9n (ro) prometheus-proxy: Container ID: docker://c54446ae210d5cee517c8f1e817b259c0a7c3b44595b3176c660bcefbb8602d7 Image: 192.168.24.1:8787/openshift3/oauth-proxy:v3.11.51-2 Image ID: docker-pullable://192.168.24.1:8787/openshift3/oauth-proxy@sha256:c7da086516ddb13e986af396882f2ce771ab5892eefd16c514ebd0785b0f0370 Port: 9091/TCP Host Port: 0/TCP Args: -provider=openshift -https-address=:9091 -http-address= -email-domain=* -upstream=http://localhost:9090 -htpasswd-file=/etc/proxy/htpasswd/auth -openshift-service-account=prometheus-k8s -openshift-sar={"resource": "namespaces", "verb": "get"} -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}} -tls-cert=/etc/tls/private/tls.crt -tls-key=/etc/tls/private/tls.key -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token -cookie-secret-file=/etc/proxy/secrets/session_secret -openshift-ca=/etc/pki/tls/cert.pem -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt -skip-auth-regex=^/metrics State: Running Started: Thu, 13 Dec 2018 12:10:48 -0500 Ready: True Restart Count: 0 Environment: <none> Mounts: /etc/proxy/htpasswd from secret-prometheus-k8s-htpasswd (rw) /etc/proxy/secrets from secret-prometheus-k8s-proxy (rw) /etc/tls/private from secret-prometheus-k8s-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-dxh9n (ro) rules-configmap-reloader: Container ID: docker://02b2f7dde9557b4db5af6e61d645ba148315048153aabd0767b2a90100697e6f Image: 192.168.24.1:8787/openshift3/ose-configmap-reloader:v3.11.51-2 Image ID: docker-pullable://192.168.24.1:8787/openshift3/ose-configmap-reloader@sha256:3e2f688074eae0671f71cfb1561307b18c1d638d4657a431872022cc045b0c9d Port: <none> Host Port: <none> Args: --webhook-url=http://localhost:9090/-/reload --volume-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 State: Running Started: Thu, 13 Dec 2018 12:10:54 -0500 Ready: True Restart Count: 0 Limits: cpu: 5m memory: 10Mi Requests: cpu: 5m memory: 10Mi Environment: <none> Mounts: /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-dxh9n (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s Optional: false config-out: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: prometheus-k8s-rulefiles-0: Type: ConfigMap (a volume populated by a ConfigMap) Name: prometheus-k8s-rulefiles-0 Optional: false secret-prometheus-k8s-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-tls Optional: false secret-prometheus-k8s-proxy: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-proxy Optional: false secret-prometheus-k8s-htpasswd: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-htpasswd Optional: false prometheus-k8s-db: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: prometheus-k8s-token-dxh9n: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-token-dxh9n Optional: false QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/infra=true Tolerations: node.kubernetes.io/memory-pressure:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 1h default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-0 to openshift-infra-1 Normal Pulling 1h kubelet, openshift-infra-1 pulling image "192.168.24.1:8787/openshift3/ose-prometheus-config-reloader:v3.11.51-2" Normal Pulled 1h kubelet, openshift-infra-1 Successfully pulled image "192.168.24.1:8787/openshift3/ose-prometheus-config-reloader:v3.11.51-2" Normal Created 1h kubelet, openshift-infra-1 Created container Normal Started 1h kubelet, openshift-infra-1 Started container Normal Pulling 1h kubelet, openshift-infra-1 pulling image "192.168.24.1:8787/openshift3/oauth-proxy:v3.11.51-2" Normal Created 1h kubelet, openshift-infra-1 Created container Normal Pulling 1h kubelet, openshift-infra-1 pulling image "192.168.24.1:8787/openshift3/ose-configmap-reloader:v3.11.51-2" Normal Started 1h kubelet, openshift-infra-1 Started container Normal Pulled 1h kubelet, openshift-infra-1 Successfully pulled image "192.168.24.1:8787/openshift3/oauth-proxy:v3.11.51-2" Normal Pulled 1h kubelet, openshift-infra-1 Successfully pulled image "192.168.24.1:8787/openshift3/ose-configmap-reloader:v3.11.51-2" Normal Created 1h kubelet, openshift-infra-1 Created container Normal Started 1h kubelet, openshift-infra-1 Started container Normal BackOff 1h (x2 over 1h) kubelet, openshift-infra-1 Back-off pulling image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2" Warning Failed 1h (x3 over 1h) kubelet, openshift-infra-1 Failed to pull image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2": rpc error: code = Unknown desc = Error: image openshift3/prometheus:v3.11.51-2 not found Normal Pulling 1h (x3 over 1h) kubelet, openshift-infra-1 pulling image "192.168.24.1:8787/openshift3/prometheus:v3.11.51-2" Warning Failed 1h (x3 over 1h) kubelet, openshift-infra-1 Error: ErrImagePull Warning Failed 4s (x433 over 1h) kubelet, openshift-infra-1 Error: ImagePullBackOff
On undercloud we have v3.11.51-1 tag for the prometheus image. (undercloud) [stack@undercloud-0 ~]$ docker images | grep prometheus 192.168.24.1:8787/openshift3/ose-prometheus-operator v3.11.51-2 d24ce5e6f296 9 days ago 582 MB 192.168.24.1:8787/openshift3/ose-prometheus-config-reloader v3.11.51-2 e8c3cd83fd6e 9 days ago 510 MB 192.168.24.1:8787/openshift3/prometheus v3.11.51-1 c99a294ec062 9 days ago 283 MB 192.168.24.1:8787/openshift3/prometheus-node-exporter v3.11.51-2 52f02f543117 9 days ago 225 MB 192.168.24.1:8787/openshift3/prometheus-alertmanager v3.11.51-2 cf43629ba0d3 9 days ago 236 MB
There is no way currently in openshift-ansible to use a different tag for the prometheus image, all the image must have the same tag corresponding to the value of openshift_image_tag. We either need to bump the tag for the prometheus image in the registry [1] from v3.11.51-1 to v3.11.51-2 to match the other openshift images, or as a workaround, stop setting the "tag_from_label" in ContainerImagePrepare. However, I'd be cautious with the workaround as this is going to mess up with the updates for the haproxy and keepalived services managed by tripleo. A better workaround would be to re-tag the prometheus in the local registry: $ docker tag 192.168.24.1:8787/openshift3/prometheus:v3.11.51-1 192.168.24.1:8787/openshift3/prometheus:v3.11.51-2 $ docker push 192.168.24.1:8787/openshift3/prometheus:v3.11.51-2 [1] https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/prometheus
Adding a workaround that worked for me before triggering the overcloud deploy: docker pull registry.access.redhat.com/openshift3/prometheus:v3.11.51-1 skopeo --tls-verify=false copy docker://registry.access.redhat.com/openshift3/prometheus:v3.11.51-1 docker://192.168.24.1:8787/openshift3/prometheus:v3.11.51-2
(In reply to Marius Cornea from comment #3) > Adding a workaround that worked for me before triggering the overcloud > deploy: > > docker pull registry.access.redhat.com/openshift3/prometheus:v3.11.51-1 > skopeo --tls-verify=false copy > docker://registry.access.redhat.com/openshift3/prometheus:v3.11.51-1 > docker://192.168.24.1:8787/openshift3/prometheus:v3.11.51-2 Small correction, only the skopeo command is needed as workaround: skopeo --tls-verify=false copy docker://registry.access.redhat.com/openshift3/prometheus:v3.11.51-1 docker://192.168.24.1:8787/openshift3/prometheus:v3.11.51-2
*** Bug 1680523 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0878