Bug 1469295

Summary: After upgrading to 3.5 hawkular-metrics pod cannot start with error against permissions
Product: OpenShift Container Platform Reporter: Eric Jones <erjones>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED NOTABUG QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.0CC: aos-bugs, erjones, pweil
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-03 16:09:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Jones 2017-07-10 21:44:06 UTC
Description of problem:
Customer upgraded OpenShift cluster from 3.3 to 3.4 and then to 3.5 and tried to deploy metrics.

It deployed but eh hawkular-metrics pod fails with:
Starting Hawkular Metrics Error: the service account for Hawkular Metrics does not have permission to view resources in this namespace. View permissions are required for Hawkular Metrics to function properly. Usually this can be resolved by running: oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra 


Version-Release number of selected component (if applicable):
Hawkular_metrics
metrics-hawkular-metrics   v3.5                4ede8a0257c8 = 3.5.0-22

Heapster
metrics-heapster           v3.5                56f0e1727405 = 3.5.0-16

Cassandra
metrics-cassandra          v3.5                46585da34fbe = 3.5.0-19

Additional info:
Found BZ 1448462 and tested the commands there (cacert_output) but it had vastly different output than what that bug help so I opened this new one.

I will be attaching logs and that acert_output file shortly.

Comment 2 Matt Wringe 2017-07-11 13:47:41 UTC
From the attached cacert_output the error is that the OpenShift Master endpoint is not accepting a connection and is closing it (Connection reset by peer)

This can mean a few things.

The master API is not available. It could be behind a firewall or not exposed in a way that the Hawkular Metrics pod can access it.

The master API is not available at the expected hostname. By default this is https://kubernetes.default.svc:443 but the system may be setup to use a different internal hostname for it (you can configure a different hostname by specifying the openshift_metrics_master_url property in your inventory file).

Comment 7 Matt Wringe 2017-08-03 16:09:53 UTC
I am going to mark this as 'noabug' as it looks like they incorrectly set their metrics URL in their inventory file.