Created attachment 1191496 [details] sample log for metrics deployer log Description of problem: metrics deployer mode=refresh fails with 'validating the internal hawkular-metrics certificate against the route destination CA' Version-Release number of selected component (if applicable): [peng@dhcp-0-123-nay-redhat-com 33]$ oc version oc v3.3.0.19 kubernetes v1.3.0+507d3a7 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://host-8-172-83.host.centralci.eng.rdu2.redhat.com:8443 openshift v3.3.0.21 kubernetes v1.3.0+507d3a7 metrics-deployer "3.3.0": "f776b79db884c4b8291722a2cdc845cbc641362b610c11fbed6a866514df4a58", How reproducible: sometimes Steps to Reproduce: 1. deploy metrics component in 'openshift-infra' project as 'deploy' mode.[1] 2. change mode to 'refresh' and run deployer again.[2] 3. after finished, check pod status and log.[3] [1] oc new-app metrics-deployer-template -p IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,IMAGE_VERSION=3.3.0,MASTER_URL=https://host-8-172-83.host.centralci.eng.rdu2.redhat.com:8443,HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.0817-4og.qe.rhcloud.com,MODE=deploy,USE_PERSISTENT_STORAGE=false,CASSANDRA_NODES=1,CASSANDRA_PV_SIZE=10,USER_WRITE_ACCESS=false [peng@dhcp-0-123-nay-redhat-com 33]$ oc get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-8p7zt 1/1 Running 0 3m hawkular-metrics-yv2rr 1/1 Running 0 3m heapster-4b51w 1/1 Running 0 3m metrics-deployer-yq72d 0/1 Completed 0 3m [2] oc new-app metrics-deployer-template -p IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,IMAGE_VERSION=3.3.0,MASTER_URL=https://host-8-172-83.host.centralci.eng.rdu2.redhat.com:8443,HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.0817-4og.qe.rhcloud.com,MODE=refresh,USE_PERSISTENT_STORAGE=false,CASSANDRA_NODES=1,CASSANDRA_PV_SIZE=10,USER_WRITE_ACCESS=false [3] [peng@dhcp-0-123-nay-redhat-com 33]$ oc get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-8fx96 1/1 Running 0 35m hawkular-metrics-jeyoi 1/1 Running 0 35m heapster-h9wwd 1/1 Running 0 35m metrics-deployer-139ii 1/1 Error 0 35m [peng@dhcp-0-123-nay-redhat-com 33]$ oc logs metrics-deployer-139ii (...) --- validate_deployment_artifacts --- ======== ERROR ========= validate_deployment_artifacts: --- There was an error while validating the internal hawkular-metrics certificate against the route destination CA: stdin: CN = hawkular-metrics error 20 at 0 depth lookup:unable to get local issuer certificate This will prevent proper functioning of the route. ======================== --- validate_deployed_project --- VALIDATION FAILED (...) Actual results: metrics-deployer-***** pod show status 'error', and could not access hawkular-metrics, when access it, show error 503. Expected results: metrics-deployer-***** pod show status 'completed' Additional info:
I have looked into this issue and its more than just an issue with the validator misbehaving. I should have a fix in place tomorrow to resolve this.
Bug is verified, tried several times using 'refresh' mode, no error is observed. [1]oc new-app metrics-deployer-template -p IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,IMAGE_VERSION=3.3.0,MASTER_URL=https://host-8-172-83.host.centralci.eng.rdu2.redhat.com:8443,HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.0817-4og.qe.rhcloud.com,MODE=refresh,USE_PERSISTENT_STORAGE=false,CASSANDRA_NODES=1,CASSANDRA_PV_SIZE=10,USER_WRITE_ACCESS=false [2] [peng@dhcp-0-123-nay-redhat-com 33]$ oc describe pod metrics-deployer-dqmw6 (...) Containers: deployer: Container ID: docker://22ea8e6346f5460bcab321caa1e4331c8403d0a7eba5722337c2b47035cd6231 Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-deployer:3.3.0 Image ID: docker://sha256:d2564383e350e470496628b7e79247f2f2442b768ea5f3d70e37ed5a65208e09 (...) [3] [peng@dhcp-0-123-nay-redhat-com 33]$ oc get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-dnj4n 1/1 Running 0 6m hawkular-metrics-ok7sv 1/1 Running 0 6m heapster-0rpm2 1/1 Running 0 6m metrics-deployer-ltqjc 0/1 Completed 0 7m [4] oc logs metrics-deployer-ltqjc (...) VALIDATION SUCCEEDED validate_nodes_accessible: ok validate_deployment_artifacts: ok validate_deployed_project: Success! (...)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933
Did not affect a released version.