Bug 1504935

Summary: Cannot add Openshift provider 3.6 to CloudForms - Hawkular will not validate
Product: Red Hat CloudForms Management Engine Reporter: Saif Ali <saali>
Component: ProvidersAssignee: Beni Paskin-Cherniavsky <cben>
Status: CLOSED NOTABUG QA Contact: Dave Johnson <dajohnso>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.8.0CC: bazulay, cben, fsimonce, gblomqui, jcrumple, jfrey, jhardy, obarenbo
Target Milestone: GA   
Target Release: 5.8.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: container
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-26 16:07:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503797    

Description Saif Ali 2017-10-20 19:16:41 UTC
Description of problem:
Cannot add Openshift provider 3.6 to CloudForms - Hawkular will not validate
~~~
OCP URL validates correctly but Hawkular URL will not validate.  Error message is "Credential validation was not successful: 743: unexpected token at ' "
Openshift version is 3.6.
~~~

Version-Release number of selected component (if applicable):
CloudForms 4.2, and 4.5 

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Beni Paskin-Cherniavsky 2017-10-23 13:11:38 UTC
Let's see.  These 2 messages repeat >130 times:

MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager#authentication_check_no_validation) type: [:bearer] for [1000000000001] [NOCP_3.4] Validation failed: error, HTTP status code 401, 401 Unauthorized

MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager#authentication_check_no_validation) type: [:hawkular] for [1000000000001] [NOCP_3.4] Validation failed: error, 743: unexpected token at '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>The URL you re...

(some other failures appear just few times, I assume transient or fixed)

For main endpoint, sounds like token is wrong.  Does management-admin service account exist?
Well, you said "OCP URL validates correctly" so I assume that got resolved.

=> For hawkular: Unfortunately, we're not logging enough details, but it sounds like you're not talking to hawkular.
- Are hawkular pods alive?
- Is it exposed correctly through a route?
- Did you use same hostname as in route?
- Did you use the port that openshift router listens on (normally 443)?

For further advice, would need:
- output of `oc get route -n openshift-infra hawkular-metrics -o yaml`
- output of `oc status -n openshift-infra -v`
- screenshot of add provider screen, at least the hawkular tab

Comment 12 Beni Paskin-Cherniavsky 2017-10-24 06:45:37 UTC
    rc/hawkular-metrics created 4 days ago - 0/1 pods

    rc/heapster created 4 days ago - 0/1 pods

  * pod/hawkular-metrics-vq03p has restarted within the last 10 minutes
  * pod/heapster-ggmbh has restarted within the last 10 minutes

Should look at `oc get events -n management-infra` and `oc logs` for these pods to see why they are restarting / not running.

== Creating missing service accounts: ==

On 3.6 the playbook part that creates it normally is: https://github.com/openshift/openshift-ansible/blob/release-3.6/roles/openshift_manageiq/tasks/main.yaml

Is this a new openshift cluster?
There was recent bug in openshift-ansible where the account is not created by default (https://github.com/openshift/openshift-ansible/pull/5809).

- The "official" way to fix is rerun openshift-ansible with openshift_use_manageiq=true in [vars] section of inventory.
  I don't know how safe that is if the cluster is already in production use, should consult someone from Openshift.

- https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.0/html/managing_providers/containers_providers#configuring_service_accounts 
  is out of date, on multiple points.

Comment 15 Beni Paskin-Cherniavsky 2017-10-26 16:07:16 UTC
NOTABUG, CFME->hawkular access was blocked by firewall, after understanding that customer was able to fix and add provider.

Opened RFE bug 1506718 to make such situations easier to diagnose.