Bug 1465022 - Kibana login loop with Openshift custom CA
Summary: Kibana login loop with Openshift custom CA
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 3.9.0
Assignee: ewolinet
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-26 12:44 UTC by Ruben Romero Montes
Modified: 2023-09-15 00:02 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-12 20:19:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github fabric8io openshift-auth-proxy pull 21 0 None closed upgraded to node 4.8.3 2020-09-24 04:47:34 UTC

Description Ruben Romero Montes 2017-06-26 12:44:36 UTC
Description of problem:
Unable to login into Kibana when Openshift was deployed using a provided CA to generate all the certificates.

Version-Release number of selected component (if applicable):
3.4.1

How reproducible:
Always

Steps to Reproduce:
1. Configure inventory with custom CA.

2. Regenerate CA and Certificates using ansible
ansible-playbook playbooks/byo/openshift-cluster/redeploy-openshift-ca.yml

3. Regenerate Certificates using ansible
ansible-playbook playbooks/byo/openshift-cluster/redeploy-certificates.yml

4. Deploy logging using default values
5. Access Kibana page and try to login

Actual results:
Kibana keeps redirecting to Openshift login page and doesn't show any error

Expected results:
Access Kibana Discovery page

Additional info:
Comparing with a working environment, I have noticed that after providing the credentials, the following request:
https://kibana.apps.rromeromlogging34.quicklab.pnq2.cee.redhat.com/auth/openshift/callback?code=Cplg-_4vohCLJpmzH5YBT04EjiFm2qbzwyCAGFlteM4&state=

does not include a response cookie like the following:
openshift-auth-proxy-session	
value	"l8WjW84RkaRhJIYtNk5qVw.TLxYps…BZfgDAe-ciOmvHTdS44BUkK3RDow"
path	"/"
secure	true
httpOnly	true

From the kibana-proxy logs (debug enabled):
in callback handler for req path /auth/openshift/callback
10.129.0.1 - - [26/Jun/2017:12:43:53 +0000] "GET /auth/openshift/callback?code=0yddkJVT2bzVanPF7tEzbbyT7b6uygzo7b2i-zFY5DE&state= HTTP/1.1" 302 58 "https://openshift.internal.rromerocerts.quicklab.pnq2.cee.redhat.com/login?then=%2Foauth%2Fauthorize%3Fresponse_type%3Dcode%26redirect_uri%3Dhttps%253A%252F%252Fkibana.apps.rromerocerts.quicklab.pnq2.cee.redhat.com%252Fauth%252Fopenshift%252Fcallback%26scope%3Duser%253Ainfo%2520user%253Acheck-access%2520user%253Alist-projects%26client_id%3Dkibana-proxy" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
in passport.ensureAuthenticated for req path /
not authenticated by request session.
10.129.0.1 - - [26/Jun/2017:12:43:53 +0000] "GET / HTTP/1.1" 302 0 "https://openshift.internal.rromerocerts.quicklab.pnq2.cee.redhat.com/login?then=%2Foauth%2Fauthorize%3Fresponse_type%3Dcode%26redirect_uri%3Dhttps%253A%252F%252Fkibana.apps.rromerocerts.quicklab.pnq2.cee.redhat.com%252Fauth%252Fopenshift%252Fcallback%26scope%3Duser%253Ainfo%2520user%253Acheck-access%2520user%253Alist-projects%26client_id%3Dkibana-proxy" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"

Comment 2 Peter Portante 2017-06-28 14:44:50 UTC
Reassigning as this is not a logging problem.  The customer does not appear to want to use the recommended method of using separate certificates for external and internal services.

Comment 3 Luke Meyer 2017-06-28 18:24:12 UTC
The kibana-proxy talks to the master at https://kubernetes.default.svc.cluster.local which is a cluster-internal address resolved by SkyDNS to an internal IP. It uses the /var/run/secrets/kubernetes.io/serviceaccount/ca.crt supplied to every pod by kubernetes for the purpose of talking to the master at its internal address.

I'm actually still a little confused about why the solution described here doesn't work for the customer:
https://docs.openshift.com/container-platform/3.5/install_config/certificate_customization.html#configuring-custom-certificates

Even if they're wanting to use a single address for the external master API and for the nodes and other infrastructure to talk to, because kibana-proxy is making the request against kubernetes.default.svc.cluster.local the master should still be able to serve that address separately with the OpenShift-generated internal cert validated by the internal CA. So if their master config ends up like the following I don't see the problem:

servingInfo:
  bindAddress: 0.0.0.0:8443
  bindNetwork: tcp4
  certFile: master.server.crt
  clientCA: ca-bundle.crt
  keyFile: master.server.key
  maxRequestsInFlight: 500
  requestTimeoutSeconds: 3600
  namedCertificates:
  - certFile: custom.crt
    keyFile: custom.key
    names:
    - "master.customer.com"

I'm also surprised there isn't similarly a problem with the router which I believe also watches the master at this internal address to build its routing tables (I could be off though).

But, let's suppose we wanted to reconfigure kibana-proxy to make the master request using the same address as everything else. This is possible with some reconfiguration to have kibana-proxy use a different address and CA.

The address is actually the easiest thing to change; there's a parameter OAP_MASTER_URL in the deployer kibana-proxy template (see https://github.com/openshift/origin-aggregated-logging/blob/release-1.4/deployer/templates/kibana.yaml#L112) which can follows the deployer parameters MASTER_URL (see https://github.com/openshift/origin-aggregated-logging/blob/release-1.4/deployer/deployer.yaml#L154).

The CA is a bit more tricky. I suspect that updating the master-config.yaml to replace ca-bundle.crt with their custom CA chain should work, but I've never tried it (and don't know if there's an ansible variable to achieve this). But that would probably be the most thorough solution if it worked, as other pods would have access to the correct CA not just kibana-proxy.

To update the kibana-proxy specifically, you would need to add a secret (or an entry to an existing secret) to provide the CA bundle. Then you would need to update the environment variable OAP_MASTER_CA_FILE on the kibana-proxy container template (you can see where it is defined now at https://github.com/openshift/origin-aggregated-logging/blob/release-1.4/deployer/templates/kibana.yaml#L121) to point where the secret has mounted it. There is definitely no deployer/ansible parameter for doing this, it would be an after-deployment modification.

Having done all that, it occurs to me that Elasticsearch will need the exact same treatment because it too queries the master with the auth token that it gets. It is not at all apparent to me how or if configuration there is even possible; it's using kubernetes-client which seems to have the values hardcoded. https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L110

Comment 4 Ruben Romero Montes 2017-06-28 20:31:38 UTC
I have tested with 6.11.0 [1] and I don't have this problem. Also tested with tag 0.1.0 using nodejs 4.7.2 [2] but it didn't work.

[1] https://github.com/ruromero/openshift-auth-proxy/tree/update-pkg
[2] https://github.com/fabric8io/openshift-auth-proxy/tree/0.1.0

Comment 7 Jordan Liggitt 2017-07-12 15:42:53 UTC
What inputs were provided to the ansible configuration?

1. Did they override the signing CA?
2. Did they provide a custom CA bundle?
3. Were certificates for internal hostnames regenerated as well?

Comment 9 Ruben Romero Montes 2017-09-11 09:54:33 UTC
(In reply to Jordan Liggitt from comment #7)
> What inputs were provided to the ansible configuration?
> 
> 1. Did they override the signing CA?
> 2. Did they provide a custom CA bundle?
> 3. Were certificates for internal hostnames regenerated as well?

They redeployed certificates using a custom CA.

https://docs.openshift.com/container-platform/3.5/install_config/redeploying_certificates.html#redeploying-new-custom-ca

I honestly don't know the exact reason but it has been confirmed to be solved after upgrading to nodeJS 4.8.1 

https://github.com/fabric8io/openshift-auth-proxy/pull/21

Comment 10 Simo Sorce 2017-09-11 13:59:19 UTC
Not auth related, moving to Kibana (Logging)

Comment 11 Ruben Romero Montes 2017-09-12 14:20:10 UTC
Setting the following env variable on kibana-proxy is enough to avoid the logging loop

NODE_TLS_REJECT_UNAUTHORIZED: 0

If I have read the documentation correctly, this value will ignore self-signed certificates.

What I don't understand is why with the default value for OAP_MASTER_CA_FILE which is the /var/run/secrets/kubernetes.io/serviceaccount/ca.crt is not working
  https://github.com/fabric8io/openshift-auth-proxy/blob/master/lib/config.js#L77

Thank you for your comments beforehand.

Comment 12 Simo Sorce 2017-09-12 18:56:46 UTC
> NODE_TLS_REJECT_UNAUTHORIZED: 0

> If I have read the documentation correctly, this value will ignore self-signed > certificates.

No! This value will ignore certificate validity entirely, accepting anything.
You should *never* use this option except for some debugging/development reason.

Comment 16 Simo Sorce 2017-09-13 18:07:40 UTC
We clearly need a CI job that deploys an uses an "external" CA, so we can validate these code paths.

Comment 25 Jeff Cantrill 2019-06-21 16:31:47 UTC
@Eric,

Reassigning to you to add any input to #c22.  Please comment as appropriate and reclose if it makes sense.

Comment 26 Jeff Cantrill 2019-07-12 20:19:08 UTC
Reclosing given no activity from the customer who was redeploying certs 2 months prior.

Comment 27 Robert Bost 2019-08-06 18:07:52 UTC
I think this issue can be chalked up to something related to https://access.redhat.com/solutions/3805841. Customer should probably use openshift_additiona_ca to configure trust between oauth-proxy and k8s api.

Comment 28 Red Hat Bugzilla 2023-09-15 00:02:45 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.