Description of problem: Hawkular validation is failing. The logs are showing unauthorized for the provider, however, screenshots provided by the customer are showing the provider is authorized correctly. Version-Release number of selected component (if applicable): 5.10.9 How reproducible: always Steps to Reproduce: 1. Validate OpenShift provider 2. Validate Hawkular Metrics 3. Actual results: validation fails Expected results: validation successful Additional info:
Hi Ryan, Can you add more details to the description ? The main things I'm not clear about are: a - Can the customer connect to Hawkular using curl commands ? https://github.com/moolitayer/hawkular-curl-examples/blob/master/hello-world/04-show-metric.sh b - Did the validation in step 2 of "How reproducible" succeed or not ? more screenshots will help here. from the description I understand that it fails, but from https://bugzilla.redhat.com/show_bug.cgi?id=1789607#c7 I understand that it pass, what am I missing here ? c - Do you sometimes collect metrics, screenshot of overview page ? p.s. If you think of other information that can help, please add it too Thanks
Hi Yaacov, Waiting on a response from the customer. I will update this report once they come back
We had a call with the customer this morning in order to better understand what is happening and what their expectations are. When using SSL without validation for both the default and metrics endpoints there are no issues, as soon as they set to SSL or SSL trusting custom CA authentication fails. In our documentation, it states that SSL without validation is not recommended [1]. The customer is expecting to be able to use SSL trusting custom CA but this fails and we were able to reproduce this in house based on the steps in the documentation [1]. [1] https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.7/html/managing_providers/containers_providers#adding_openshift_provider
@Ryan thank you for the new information, it looks like a certification issue, I do not have experience with the certification part of the code, Beni can you help here ?
Copying better explanation of issue from internal email: > We were able to get Hawkular working when using SSL without validation, however, our documentation labels this as not recommended. > The customer would like to use SSL trusting a CA but this fails and this is what we were able to reproduce in the lab environment. So far we failed to reproduce, although we used a mock service (Yaacov's mohawk) instead of actual hawkular: The mock was behind a reencrypt route with no TLS cert set on it, so used router's default wildcard cert. - SSL without validation worked. - SSL trusting custom CA, pasting the correct root CA cert, worked. - SSL trusting custom CA, pasting unrelated self-signed CA, gives error "Credential validation was not successful: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain)" Importantly, I'm getting same behavior from ManageIQ as from `curl --cacert file.pem https://$DOMAIN:$PORT/hawkular/metrics/status`. with correct CA both curl and manageiq accept, with wrong CA both curl and manageiq give error. => It's possible I need to use exact versions of manageiq / openshift / hawkular. Ryan, sorry to ask you for standing up a reproducer openshift 3rd time but that could help... => It's also possible the configuration is wrong. Particularly, old openshifts used to serve out-of-the-box an unusable cert for hawkular—wrong domain name—for which no "right CA" would help! If these are config issues, the previous reproducer could be *different* issue from customer's, so need details on how the hawkular route is configured in customer's cluster. Have we tested accessing customer's hawkular with `curl --cacert` or something like that? That would prove which direction to focus on. Found an old internal doc on hawkular certs: https://mojo.redhat.com/docs/DOC-1132146, reviewing that to come up with more precise questions...
Aside: this log messages: [----] W, [2020-01-03T13:20:18.415210 #3174:44c2cb0] WARN -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager#authentication_check_no_validation) type: ["hawkular"] for [1000000000001] [openshift-produccion] Validation failed: error, <html><head><title>Error</title></head><body>500 - Internal Server Error</body></html> [----] W, [2020-01-03T13:20:20.464008 #3174:44c33f4] WARN -- : MIQ(ems_container_controller-update): get_hostname_from_routes error: Unauthorized [----] E, [2020-01-03T13:20:20.464709 #3174:44c33f4] ERROR -- : MIQ(ems_container_controller-update): Route Detection: failure [Unauthorized] explain that 500 was not in manageiq but from external service. It's subopitmal that this scenario showed "Error500 - Internal Server Error" to the user without qualification. Anyway, that error is no longer happening, and it's not the current problem, getting custom CA working is.
Asked yesterday for more info I need on support case.