Bug 1367690

Summary: metrics deployer mode=refresh fails with 'validating the internal hawkular-metrics certificate against the route destination CA'
Product: OpenShift Container Platform Reporter: Peng Li <penli>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED CURRENTRELEASE QA Contact: chunchen <chunchen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: aos-bugs, tdawson, wsun
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:44:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sample log for metrics deployer log none

Description Peng Li 2016-08-17 08:57:48 UTC
Created attachment 1191496 [details]
sample log for metrics deployer log

Description of problem:
metrics deployer mode=refresh fails with 'validating the internal hawkular-metrics certificate against the route destination CA'
 
Version-Release number of selected component (if applicable):
[peng@dhcp-0-123-nay-redhat-com 33]$ oc version
oc v3.3.0.19
kubernetes v1.3.0+507d3a7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://host-8-172-83.host.centralci.eng.rdu2.redhat.com:8443
openshift v3.3.0.21
kubernetes v1.3.0+507d3a7

metrics-deployer  "3.3.0": "f776b79db884c4b8291722a2cdc845cbc641362b610c11fbed6a866514df4a58",

How reproducible:
sometimes

Steps to Reproduce:
1. deploy metrics component in 'openshift-infra' project as 'deploy' mode.[1]
2. change mode to 'refresh' and run deployer again.[2]
3. after finished, check pod status and log.[3]

[1]
oc new-app metrics-deployer-template -p IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,IMAGE_VERSION=3.3.0,MASTER_URL=https://host-8-172-83.host.centralci.eng.rdu2.redhat.com:8443,HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.0817-4og.qe.rhcloud.com,MODE=deploy,USE_PERSISTENT_STORAGE=false,CASSANDRA_NODES=1,CASSANDRA_PV_SIZE=10,USER_WRITE_ACCESS=false

[peng@dhcp-0-123-nay-redhat-com 33]$ oc get pod
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-8p7zt   1/1       Running     0          3m
hawkular-metrics-yv2rr       1/1       Running     0          3m
heapster-4b51w               1/1       Running     0          3m
metrics-deployer-yq72d       0/1       Completed   0          3m

[2]
oc new-app metrics-deployer-template -p IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,IMAGE_VERSION=3.3.0,MASTER_URL=https://host-8-172-83.host.centralci.eng.rdu2.redhat.com:8443,HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.0817-4og.qe.rhcloud.com,MODE=refresh,USE_PERSISTENT_STORAGE=false,CASSANDRA_NODES=1,CASSANDRA_PV_SIZE=10,USER_WRITE_ACCESS=false

[3]
[peng@dhcp-0-123-nay-redhat-com 33]$ oc get pod
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-8fx96   1/1       Running   0          35m
hawkular-metrics-jeyoi       1/1       Running   0          35m
heapster-h9wwd               1/1       Running   0          35m
metrics-deployer-139ii       1/1       Error     0          35m
[peng@dhcp-0-123-nay-redhat-com 33]$ oc logs metrics-deployer-139ii 
(...)
--- validate_deployment_artifacts ---
======== ERROR =========
validate_deployment_artifacts: 
---
There was an error while validating the internal hawkular-metrics certificate against the route destination CA:
stdin: CN = hawkular-metrics
error 20 at 0 depth lookup:unable to get local issuer certificate
This will prevent proper functioning of the route.
========================
--- validate_deployed_project ---

VALIDATION FAILED
(...)


Actual results:
metrics-deployer-***** pod show status 'error', and could not access hawkular-metrics, when access it, show error 503.

Expected results:
metrics-deployer-***** pod show status 'completed' 

Additional info:

Comment 1 Matt Wringe 2016-08-17 23:09:08 UTC
I have looked into this issue and its more than just an issue with the validator misbehaving. I should have a fix in place tomorrow to resolve this.

Comment 3 Peng Li 2016-08-19 05:56:00 UTC
Bug is verified, tried several times using 'refresh' mode, no error is observed.

[1]oc new-app metrics-deployer-template -p IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,IMAGE_VERSION=3.3.0,MASTER_URL=https://host-8-172-83.host.centralci.eng.rdu2.redhat.com:8443,HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.0817-4og.qe.rhcloud.com,MODE=refresh,USE_PERSISTENT_STORAGE=false,CASSANDRA_NODES=1,CASSANDRA_PV_SIZE=10,USER_WRITE_ACCESS=false


[2]
[peng@dhcp-0-123-nay-redhat-com 33]$ oc describe pod metrics-deployer-dqmw6
(...)
Containers:
  deployer:
    Container ID:	docker://22ea8e6346f5460bcab321caa1e4331c8403d0a7eba5722337c2b47035cd6231
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-deployer:3.3.0
    Image ID:		docker://sha256:d2564383e350e470496628b7e79247f2f2442b768ea5f3d70e37ed5a65208e09
(...)

[3]
[peng@dhcp-0-123-nay-redhat-com 33]$ oc get pod
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-dnj4n   1/1       Running     0          6m
hawkular-metrics-ok7sv       1/1       Running     0          6m
heapster-0rpm2               1/1       Running     0          6m
metrics-deployer-ltqjc       0/1       Completed   0          7m

[4]
oc logs metrics-deployer-ltqjc
(...)
VALIDATION SUCCEEDED
validate_nodes_accessible: ok
validate_deployment_artifacts: ok
validate_deployed_project:
Success!
(...)

Comment 5 errata-xmlrpc 2016-09-27 09:44:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Comment 6 Matt Wringe 2016-09-27 14:30:17 UTC
Did not affect a released version.