Bug 1460615 - Fake error reported by validate_deployment_artifacts in metrics deployer
Fake error reported by validate_deployment_artifacts in metrics deployer
Status: CLOSED WONTFIX
Product: OpenShift Container Platform
Classification: Red Hat
Component: Metrics (Show other bugs)
3.4.1
Unspecified Unspecified
low Severity low
: ---
: ---
Assigned To: Matt Wringe
Junqi Zhao
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-12 04:24 EDT by Xia Zhao
Modified: 2017-06-20 15:29 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-20 15:29:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Xia Zhao 2017-06-12 04:24:55 EDT
Description of problem:
Deploy metrics 3.4.1 on OCP, after deployment process finished successfully, the deployer pod was failed by this fake error:
========================
--- validate_deployment_artifacts ---
======== RETRY =========
validate_deployment_artifacts: 
Pod hawkular-metrics-f2ta4 from ReplicationController hawkular-metrics is in a Pending state.
This is most often due to waiting for the container image to pull and should eventually resolve.
  * * * * 
Pod heapster-5413s from ReplicationController heapster is in a Pending state.
This is most often due to waiting for the container image to pull and should eventually resolve.
  * * * * 
Will retry in 5 seconds.
========================
--- validate_deployment_artifacts ---
======== ERROR =========
validate_deployment_artifacts: 
Pod hawkular-metrics-f2ta4 from ReplicationController hawkular-metrics is in a Pending state.
This is most often due to waiting for the container image to pull and should eventually resolve.
  * * * * 
Pod heapster-5413s from ReplicationController heapster specified an image that cannot be pulled.
ERROR: This is most often due to the image name being wrong or the docker registry being unavailable.
Ensure that you used the correct IMAGE_PREFIX and IMAGE_VERSION with the deployment.
There was an event for this pod with the following message:
Failed to pull image "registry.access.stage.redhat.com/openshift3/ose-metrics-heapster:3.4.1": image pull failed for registry.access.stage.redhat.com/openshift3/ose-metrics-heapster:3.4.1, this may be because there are no credentials on this request.  details: (net/http: request canceled)
  * * * * 
========================
--- validate_deployed_project ---

VALIDATION FAILED

Checked on node that images are actuallypulled and exist:
#  docker images | grep metrics | awk '{print $1"    "$3}' |awk -F'/' '{print $2"/"$3}'
openshift3/ose-metrics-hawkular-metrics    09006edb7fb1
openshift3/ose-metrics-cassandra    a8a1940f570e
openshift3/ose-metrics-heapster    93fd4e9bb041
openshift3/ose-metrics-deployer    887570eb899d


Version-Release number of selected component (if applicable):
openshift3/ose-metrics-deployer    887570eb899d

# openshift version
openshift v3.4.1.33
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

How reproducible:
Always

Steps to Reproduce:
1.Deploy metrics 3.4.1 on OCP, after deployment process finished successfully, check the deployer pod  status
2.
3.

Actual results:
deployer pod is failed:
$ oc get po
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-kiqht   1/1       Running   0          13m
hawkular-metrics-f2ta4       1/1       Running   0          13m
heapster-5413s               1/1       Running   0          13m
metrics-deployer-pb4ql       0/1       Error     0          14m

Expected results:
deployer pod should success

Additional info:
Comment 1 Xia Zhao 2017-06-12 05:19:22 EDT
Issue reproduced for this 3.3.1 metrics deployer image:
openshift3/ose-metrics-deployer        3.3.1               02b35ef44560

# openshift version
openshift v3.3.1.35
kubernetes v1.3.0+52492b4
etcd 2.3.0+git


The error message is very similar:

========================
--- validate_deployment_artifacts ---
======== RETRY =========
validate_deployment_artifacts: 
Pod hawkular-metrics-hdqzr from ReplicationController hawkular-metrics is in a Pending state.
This is most often due to waiting for the container image to pull and should eventually resolve.
  * * * * 
Pod heapster-c5f8t from ReplicationController heapster is in a Pending state.
This is most often due to waiting for the container image to pull and should eventually resolve.
  * * * * 
Will retry in 5 seconds.
========================
--- validate_deployment_artifacts ---
======== ERROR =========
validate_deployment_artifacts: 
Pod hawkular-metrics-hdqzr from ReplicationController hawkular-metrics is in a Pending state.
This is most often due to waiting for the container image to pull and should eventually resolve.
  * * * * 
Pod heapster-c5f8t from ReplicationController heapster specified an image that cannot be pulled.
ERROR: This is most often due to the image name being wrong or the docker registry being unavailable.
Ensure that you used the correct IMAGE_PREFIX and IMAGE_VERSION with the deployment.
There was an event for this pod with the following message:
Failed to pull image "registry.access.stage.redhat.com/openshift3/ose-metrics-heapster:3.3.1": image pull failed for registry.access.stage.redhat.com/openshift3/ose-metrics-heapster:3.3.1, this may be because there are no credentials on this request.  details: (net/http: request canceled)
  * * * * 
========================
--- validate_deployed_project ---

VALIDATION FAILED
Comment 2 Junqi Zhao 2017-06-19 05:32:50 EDT
Tested again, it maybe it took a long time to pull images, then the deployer pod changed to Error status, if you pull images first then deploy the metrics, it does not have this issue.
Comment 3 Matt Wringe 2017-06-20 15:29:27 EDT
The deployer is acting as expected.

It's OpenShift which is throwing that error: https://github.com/openshift/origin-metrics/blob/master/deployer/scripts/validate.sh#L311

If we encounter an error with OpenShift, we consider that there is something wrong. The deployer does not wait and see if that error will eventually resolve itself.

This also doesn't affect the functionality of metrics. Metrics will still function as expected, even if the validation script fails.

Note You need to log in before you can comment on or make changes to this bug.