Description of problem: After deployed prometheus,prometheus-node-exporter pods are in ImagePullBackOff status # oc get po NAME READY STATUS RESTARTS AGE prometheus-0 6/6 Running 0 1h prometheus-node-exporter-fqpwf 0/1 ImagePullBackOff 0 1h prometheus-node-exporter-jzfvb 0/1 ImagePullBackOff 0 1h Described prometheus-node-exporter pods, Warning Failed 1h (x4 over 1h) kubelet, 172.16.120.101 Failed to pull image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/prometheus-node-exporter:v0.15.2": rpc error: code = Unknown desc = error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>404 Not Found</title>\n</head><body>\n<h1>Not Found</h1>\n<p>The requested URL /pulp/docker/v2/redhat-openshift3-prometheus-node-exporter/manifests/v0.15.2 was not found on this server.</p>\n</body></html>\n" Warning Failed 1h (x4 over 1h) kubelet, 172.16.120.101 Error: ErrImagePull Warning Failed 6m (x410 over 1h) kubelet, 172.16.120.101 Error: ImagePullBackOff Normal BackOff 1m (x431 over 1h) kubelet, 172.16.120.101 Back-off pulling image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/prometheus-node-exporter:v0.15.2" # docker pull registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v0.15.2 Trying to pull repository registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter ... manifest for registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v0.15.2 not found don't have v0.15.2 image, and not sure prometheus-node-exporter should have v3.9 tag like other prometheus iamges # curl -X GET -k brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/v1/repositories/openshift3/prometheus-node-exporter/tags | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 646 100 646 0 0 5146 0 --:--:-- --:--:-- --:--:-- 5168 { "0.14.0": "44f72f459d624931baace1ccfc50ff81f69cdba4bcaad18e5f87f92507571d01", "0.14.0-1": "44f72f459d624931baace1ccfc50ff81f69cdba4bcaad18e5f87f92507571d01", "0.15.2": "2590ebd50e53e243ae57ecaca44a9f4410ba7d468622eec649bebbd953949acf", "0.15.2-1": "2590ebd50e53e243ae57ecaca44a9f4410ba7d468622eec649bebbd953949acf", "latest": "2590ebd50e53e243ae57ecaca44a9f4410ba7d468622eec649bebbd953949acf", "rhaos-3.7-rhel-7-docker-candidate-21542-20170906205309": "44f72f459d624931baace1ccfc50ff81f69cdba4bcaad18e5f87f92507571d01", "rhaos-3.9-rhel-7-docker-candidate-76868-20180131230619": "2590ebd50e53e243ae57ecaca44a9f4410ba7d468622eec649bebbd953949acf" } Version-Release number of selected component (if applicable): # openshift version openshift v3.9.1 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.16 How reproducible: Always Steps to Reproduce: 1. Deploy prometheus 2. 3. Actual results: prometheus-node-exporter pods are in ImagePullBackOff status Expected results: prometheus-node-exporter pods should be healthy Additional info: # Deploy prometheus openshift_prometheus_state=present openshift_prometheus_node_selector={'role': 'node'} openshift_prometheus_image_prefix=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/
Also we need make installer to pull v3.7 image as default image tag.
Sorry(In reply to Johnny Liu from comment #1) > Also we need make installer to pull v3.7 image as default image tag. Sorry, s/v3.7/v3.9/
So I'm clear, the priority issue is the "v0.15.2" (not available) vs "0.15.2" (available) tag?
Here's the proper fix for openshift-ansible: https://github.com/openshift/openshift-ansible/pull/7325 Pending internal tagging by pgier
Remove TestBlocker keyword, issue is fixed. prometheus-node-exporter also pushed to reg-aws repo. image: registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v0.15.2 Please change to ON_QA
Set to VERIFIED as per Comment 14
Set to VERIFIED, since this issue is about prometheus-node-exporter don't have v0.15.2 tag, as for prometheus-node-exporter should have v3.9 tag or not, will be considered in future, see Comment 17. Will open v3.9 tag is missign if we decide prometheus-node-exporter should have v3.9 tag
(In reply to Junqi Zhao from comment #18) > Will open v3.9 tag is missign if we decide prometheus-node-exporter should > have v3.9 tag change to Will open a defect about v3.9 tag is missing if we decide prometheus-node-exporter should have v3.9 tag
prometheus-node-exporter has v3.9 tag now % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3033 100 3033 0 0 31343 0 --:--:-- --:--:-- --:--:-- 31593 "v3.9": "fe144ca99bbe32e539fe8869293500ff4dabdf7c43b498c0c05586f87c166b37", "v3.9.10": "d923e17649b8ba65515193b64f8e0fe15b23b5a7b1ea0acede1397509ac0a094", "v3.9.10-1": "d923e17649b8ba65515193b64f8e0fe15b23b5a7b1ea0acede1397509ac0a094", "v3.9.10.20180315.142724": "d923e17649b8ba65515193b64f8e0fe15b23b5a7b1ea0acede1397509ac0a094", "v3.9.11": "54799bb6d52c7852b69d20c9982b78b004c3e7cfe2f37b462dcae645b94b30c6", "v3.9.11-1": "54799bb6d52c7852b69d20c9982b78b004c3e7cfe2f37b462dcae645b94b30c6", "v3.9.11.20180315.181300": "54799bb6d52c7852b69d20c9982b78b004c3e7cfe2f37b462dcae645b94b30c6", "v3.9.12": "fe144ca99bbe32e539fe8869293500ff4dabdf7c43b498c0c05586f87c166b37", "v3.9.12-1": "fe144ca99bbe32e539fe8869293500ff4dabdf7c43b498c0c05586f87c166b37", "v3.9.12.20180319.095352": "fe144ca99bbe32e539fe8869293500ff4dabdf7c43b498c0c05586f87c166b37", "v3.9.8": "7ddf429234a77e029845dc5a6407df33e7980efd3ebb8512d59a73b2fa9358e4", "v3.9.8-1": "7ddf429234a77e029845dc5a6407df33e7980efd3ebb8512d59a73b2fa9358e4", "v3.9.8.20180313.172024": "7ddf429234a77e029845dc5a6407df33e7980efd3ebb8512d59a73b2fa9358e4", "v3.9.9": "ca3c74c435f1b05251ccc6654e7cda0d7cc1acc53d39d18fdc62f6bd22c0108c", "v3.9.9-1": "ca3c74c435f1b05251ccc6654e7cda0d7cc1acc53d39d18fdc62f6bd22c0108c", "v3.9.9.20180314.185428": "ca3c74c435f1b05251ccc6654e7cda0d7cc1acc53d39d18fdc62f6bd22c0108c"
Since this is still in the 3.9(.14) GA Code the fix is to add this line to the /etc/ansible/hosts file: openshift_prometheus_node_exporter_image_version=v3.9
The fix from comment #5 was backported to the 3.9 branch: https://github.com/openshift/openshift-ansible/pull/7673 which means you shouldn't need to workaround by explicitly setting a tag - it would pick the release tag by default. The backport didn't make it in time for GA though, but I assume it will be part of a future update.
The fix for this is in openshift-ansible-3.9.27-1 and later.