Bug 1549936 - prometheus-node-exporter pods are in ImagePullBackOff status, don't have v0.15.2 image
Summary: prometheus-node-exporter pods are in ImagePullBackOff status, don't have v0.1...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.z
Assignee: Aaron Weitekamp
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-28 05:50 UTC by Junqi Zhao
Modified: 2018-05-30 07:45 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-29 21:42:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Junqi Zhao 2018-02-28 05:50:06 UTC
Description of problem:
After deployed prometheus,prometheus-node-exporter pods are in ImagePullBackOff status
# oc get po
NAME                             READY     STATUS             RESTARTS   AGE
prometheus-0                     6/6       Running            0          1h
prometheus-node-exporter-fqpwf   0/1       ImagePullBackOff   0          1h
prometheus-node-exporter-jzfvb   0/1       ImagePullBackOff   0          1h

Described prometheus-node-exporter pods,
  Warning  Failed                 1h (x4 over 1h)    kubelet, 172.16.120.101  Failed to pull image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/prometheus-node-exporter:v0.15.2": rpc error: code = Unknown desc = error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>404 Not Found</title>\n</head><body>\n<h1>Not Found</h1>\n<p>The requested URL /pulp/docker/v2/redhat-openshift3-prometheus-node-exporter/manifests/v0.15.2 was not found on this server.</p>\n</body></html>\n"
  Warning  Failed                 1h (x4 over 1h)    kubelet, 172.16.120.101  Error: ErrImagePull
  Warning  Failed                 6m (x410 over 1h)  kubelet, 172.16.120.101  Error: ImagePullBackOff
  Normal   BackOff                1m (x431 over 1h)  kubelet, 172.16.120.101  Back-off pulling image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/prometheus-node-exporter:v0.15.2"

# docker pull registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v0.15.2
Trying to pull repository registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter ... 
manifest for registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v0.15.2 not found

don't have v0.15.2 image, and not sure prometheus-node-exporter should have v3.9 tag like other prometheus iamges
# curl -X GET -k brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/v1/repositories/openshift3/prometheus-node-exporter/tags | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   646  100   646    0     0   5146      0 --:--:-- --:--:-- --:--:--  5168
{
    "0.14.0": "44f72f459d624931baace1ccfc50ff81f69cdba4bcaad18e5f87f92507571d01",
    "0.14.0-1": "44f72f459d624931baace1ccfc50ff81f69cdba4bcaad18e5f87f92507571d01",
    "0.15.2": "2590ebd50e53e243ae57ecaca44a9f4410ba7d468622eec649bebbd953949acf",
    "0.15.2-1": "2590ebd50e53e243ae57ecaca44a9f4410ba7d468622eec649bebbd953949acf",
    "latest": "2590ebd50e53e243ae57ecaca44a9f4410ba7d468622eec649bebbd953949acf",
    "rhaos-3.7-rhel-7-docker-candidate-21542-20170906205309": "44f72f459d624931baace1ccfc50ff81f69cdba4bcaad18e5f87f92507571d01",
    "rhaos-3.9-rhel-7-docker-candidate-76868-20180131230619": "2590ebd50e53e243ae57ecaca44a9f4410ba7d468622eec649bebbd953949acf"
}
Version-Release number of selected component (if applicable):
# openshift version
openshift v3.9.1
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

How reproducible:
Always

Steps to Reproduce:
1. Deploy prometheus
2.
3.

Actual results:
prometheus-node-exporter pods are in ImagePullBackOff status

Expected results:
prometheus-node-exporter pods should be healthy

Additional info:
# Deploy prometheus
openshift_prometheus_state=present
openshift_prometheus_node_selector={'role': 'node'}
openshift_prometheus_image_prefix=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/

Comment 1 Johnny Liu 2018-02-28 08:21:44 UTC
Also we need make installer to pull v3.7 image as default image tag.

Comment 2 Johnny Liu 2018-02-28 08:39:24 UTC
Sorry(In reply to Johnny Liu from comment #1)
> Also we need make installer to pull v3.7 image as default image tag.

Sorry, s/v3.7/v3.9/

Comment 4 Aaron Weitekamp 2018-02-28 15:49:45 UTC
So I'm clear, the priority issue is the "v0.15.2" (not available) vs "0.15.2" (available) tag?

Comment 5 Aaron Weitekamp 2018-02-28 16:35:06 UTC
Here's the proper fix for openshift-ansible: https://github.com/openshift/openshift-ansible/pull/7325

Pending internal tagging by pgier

Comment 14 Junqi Zhao 2018-03-06 04:39:37 UTC
Remove TestBlocker keyword, issue is fixed.
prometheus-node-exporter also pushed to reg-aws repo.

image: registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v0.15.2

Please change to ON_QA

Comment 15 Junqi Zhao 2018-03-06 07:34:55 UTC
Set to VERIFIED as per Comment 14

Comment 18 Junqi Zhao 2018-03-08 01:44:33 UTC
Set to VERIFIED, since this issue is about prometheus-node-exporter don't have v0.15.2 tag, as for prometheus-node-exporter should have v3.9 tag or not, will be  considered in future, see Comment 17. 

Will open v3.9 tag is missign if we decide prometheus-node-exporter should have v3.9 tag

Comment 19 Junqi Zhao 2018-03-08 01:45:48 UTC
(In reply to Junqi Zhao from comment #18)

> Will open v3.9 tag is missign if we decide prometheus-node-exporter should
> have v3.9 tag

change to

Will open a defect about v3.9 tag is missing if we decide prometheus-node-exporter should have v3.9 tag

Comment 20 Junqi Zhao 2018-03-20 03:00:45 UTC
prometheus-node-exporter has v3.9 tag now  

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3033  100  3033    0     0  31343      0 --:--:-- --:--:-- --:--:-- 31593
    "v3.9": "fe144ca99bbe32e539fe8869293500ff4dabdf7c43b498c0c05586f87c166b37",
    "v3.9.10": "d923e17649b8ba65515193b64f8e0fe15b23b5a7b1ea0acede1397509ac0a094",
    "v3.9.10-1": "d923e17649b8ba65515193b64f8e0fe15b23b5a7b1ea0acede1397509ac0a094",
    "v3.9.10.20180315.142724": "d923e17649b8ba65515193b64f8e0fe15b23b5a7b1ea0acede1397509ac0a094",
    "v3.9.11": "54799bb6d52c7852b69d20c9982b78b004c3e7cfe2f37b462dcae645b94b30c6",
    "v3.9.11-1": "54799bb6d52c7852b69d20c9982b78b004c3e7cfe2f37b462dcae645b94b30c6",
    "v3.9.11.20180315.181300": "54799bb6d52c7852b69d20c9982b78b004c3e7cfe2f37b462dcae645b94b30c6",
    "v3.9.12": "fe144ca99bbe32e539fe8869293500ff4dabdf7c43b498c0c05586f87c166b37",
    "v3.9.12-1": "fe144ca99bbe32e539fe8869293500ff4dabdf7c43b498c0c05586f87c166b37",
    "v3.9.12.20180319.095352": "fe144ca99bbe32e539fe8869293500ff4dabdf7c43b498c0c05586f87c166b37",
    "v3.9.8": "7ddf429234a77e029845dc5a6407df33e7980efd3ebb8512d59a73b2fa9358e4",
    "v3.9.8-1": "7ddf429234a77e029845dc5a6407df33e7980efd3ebb8512d59a73b2fa9358e4",
    "v3.9.8.20180313.172024": "7ddf429234a77e029845dc5a6407df33e7980efd3ebb8512d59a73b2fa9358e4",
    "v3.9.9": "ca3c74c435f1b05251ccc6654e7cda0d7cc1acc53d39d18fdc62f6bd22c0108c",
    "v3.9.9-1": "ca3c74c435f1b05251ccc6654e7cda0d7cc1acc53d39d18fdc62f6bd22c0108c",
    "v3.9.9.20180314.185428": "ca3c74c435f1b05251ccc6654e7cda0d7cc1acc53d39d18fdc62f6bd22c0108c"

Comment 21 Wolfgang Kulhanek 2018-03-29 14:12:57 UTC
Since this is still in the 3.9(.14) GA Code the fix is to add this line to the /etc/ansible/hosts file:

openshift_prometheus_node_exporter_image_version=v3.9

Comment 22 Josep 'Pep' Turro Mauri 2018-04-06 07:46:07 UTC
The fix from comment #5 was backported to the 3.9 branch:

https://github.com/openshift/openshift-ansible/pull/7673

which means you shouldn't need to workaround by explicitly setting a tag - it would pick the release tag by default.

The backport didn't make it in time for GA though, but I assume it will be part of a future update.

Comment 25 Scott Dodson 2018-05-29 21:42:51 UTC
The fix for this is in openshift-ansible-3.9.27-1 and later.


Note You need to log in before you can comment on or make changes to this bug.