Bug 1685704

Summary: Need a separate internal trust chain and apiserver name for internal clients on the host network, namely kubelet
Product: OpenShift Container Platform Reporter: Justin Pierce <jupierce>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED ERRATA QA Contact: Chuan Yu <chuyu>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: adahiya, aos-bugs, deads, decarr, dgoodwin, eparis, gblomqui, hongli, jokerman, mmccomas, mwoodson, nagrawal, nmalik, sanchezl, schoudha, scuppett, sponnaga, sttts, xxia, yinzhou
Target Milestone: ---Keywords: DeliveryBlocker, OpsBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:45:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Listings none

Description Justin Pierce 2019-03-05 20:47:14 UTC
Created attachment 1541146 [details]
Listings

Description of problem:

kube-apiserver entered failing state after applying change to apiserver/cluster resources 


..snippet of apiserver/cluster..
spec:
  servingCerts:
    namedCertificates:
    - names:
      - api.int-3.online-starter.openshift.com
      servingCertificate:
        name: api-certs

I created secrets/api-certs in openshift-config. The certificate is used for both the router and the master and signed for two wildcards: 'DNS:*.apps.int-3.online-starter.openshift.com,DNS:*.int-3.online-starter.openshift.com'.


Version-Release number of selected component (if applicable):
version   4.0.0-0.alpha-2019-03-04-160136 

Expected results:
kube-apiserver should not be failing and api.<cluster> should start serving using this certificate. 

Additional info:
See listings in attachments.

Comment 1 Stefan Schimanski 2019-03-07 15:24:25 UTC
Can you attach the logs of the kube-apiserver-operator and the relevant installer (installer-61-ip-10-0-163-238.us-east-2.compute.internal in this case)? The former spawns the later and the later copies the custom cert (or it doesn't for some reason).

Also all kube-apiserver-operator events would be helpful.

Comment 3 Luis Sanchez 2019-04-03 01:41:30 UTC
* The user is specifying a custom certificate to be presented by the apiserver when accessed via the load balancer.
* The cluster's kubelets access the apiserver via the load balancer address.
* The cluster's kublets do not have the needed CA certificates to trust the certificate presented by the apiserver.

Still investigating how the user might configure the trusted CA certs used by the kubelets.

Comment 5 Luis Sanchez 2019-04-03 18:07:34 UTC
We need to be able to configure trusted CAs for Kubelets.

Comment 7 Luis Sanchez 2019-04-05 13:15:21 UTC
The kube-apiserver can be configured successfully to serve a customer managed certificate. In this case, we have verified that customer managed certificate is indeed being served. Unfortunately, there is currently no way to configure the kublets to trust the customer managed certificates. As a result the cluster's nodes  go into 'NotReady' state as their kubelets lose the ability to communicate with the api-server. We need to be able to configure trusted CAs for kubelets.

Comment 8 David Eads 2019-04-08 12:15:37 UTC
@sjenning more detailed analysis in comment 7

Comment 9 Seth Jennings 2019-04-08 20:55:11 UTC
Ok, this bug requires changes in several areas:

1) installer need to create internal name for the API endpoint (api-int.$clustername.$basedomain) so we can use the internal CA the kubelets use to connect to the API server which will use SNI to select the right cert
2) KAS needs to have two apiserver serving certs, one for the external api name, which is potentially signed by the customer CA, and an internal api name, which will be the long lived self-signed CA the kubelet currently uses
3) the kubelet bootstrap kubeconfig generated by the MCS need to reference the internal api name
4) installer UPI documentation needs to document the internal name DNS requirement

Comment 10 Seth Jennings 2019-04-08 21:24:58 UTC
installer build apiserver URL from baseDomain provided in install config
https://github.com/openshift/installer/blob/91ba0f3fde6e36f06ba3609bee5bddf4ab0f5695/pkg/asset/manifests/utils.go#L36-L37

"cluster" Infrastructure resource is created with apiServerURL in the Status
https://github.com/openshift/installer/blob/91ba0f3fde6e36f06ba3609bee5bddf4ab0f5695/pkg/asset/manifests/infrastructure.go#L49-L49
https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go#L55-L58

The --apiserver-url that is fed into the bootstrap MCS -> ignition config -> /etc/kubernetes/kubeconfig -> kubelet bootstrap -> /var/lib/kubelet/kubeconfig comes from there.

Thus, in order to change the apiserver URL the kubelets use, we need to change the installer code in the first link s/api/api-int.

Comment 11 David Eads 2019-04-08 21:58:06 UTC
And the installer needs to produce a *separate* serving cert signed for that name for the kube-apiserver.

Comment 12 Eric Paris 2019-04-10 18:33:36 UTC
Why couldn't the installer create a single cert good for both api. and api-int. ? If a customer provided their own cert for api. it would be up to the apiserver to use that one instead of the one provided by the installer.  Is the apiserver not capable of that?

Comment 13 Seth Jennings 2019-04-10 19:07:09 UTC
The single cert would have to be signed by the customer CA if the so configured.  Then we end up with the same issue we have now; the kubelet doesn't trust the customer CA.

The advantage of two certs for two different names is they can have different signing CAs; customer CA can sign the external one and kube-ca can remain the CA for the internal one.  Unfortunately, I think David is saying that kube-ca can't sign for this new internal cert for some reason which is unclear to me.

Comment 14 Eric Paris 2019-04-10 20:06:44 UTC
My suggestion would be that WE would ship with 1 cert good for "api" and "api-int". But a customer could add a second cert, which would be used instead of our single cert, for the "api" name.

Comment 15 Abhinav Dahiya 2019-04-10 23:15:36 UTC
(In reply to Justin Pierce from comment #0)
> Created attachment 1541146 [details]
> Listings
> 
> Description of problem:
> 
> kube-apiserver entered failing state after applying change to
> apiserver/cluster resources 
> 
> 
> ..snippet of apiserver/cluster..
> spec:
>   servingCerts:
>     namedCertificates:
>     - names:
>       - api.int-3.online-starter.openshift.com
>       servingCertificate:
>         name: api-certs
> 
> I created secrets/api-certs in openshift-config. The certificate is used for
> both the router and the master and signed for two wildcards:
> 'DNS:*.apps.int-3.online-starter.openshift.com,DNS:*.int-3.online-starter.
> openshift.com'.

I get that users will need to update the certificate used by the router as that servers applications. Do you have a reason why you want to update the api.cluster_domain serving certificate. that is not user facing in terms of applications and only used by the k8s client that already require the certificate-authority to trust the api server. _ just looking for more information on the motivation_

previously 3.x was served through api endpoint, but that has also moved to its own route.

> 
> Version-Release number of selected component (if applicable):
> version   4.0.0-0.alpha-2019-03-04-160136 
> 
> Expected results:
> kube-apiserver should not be failing and api.<cluster> should start serving
> using this certificate. 
> 
> Additional info:
> See listings in attachments.

Comment 16 Abhinav Dahiya 2019-04-10 23:21:30 UTC
(In reply to Abhinav Dahiya from comment #15)
> (In reply to Justin Pierce from comment #0)
> > Created attachment 1541146 [details]
> > Listings
> > 
> > Description of problem:
> > 
> > kube-apiserver entered failing state after applying change to
> > apiserver/cluster resources 
> > 
> > 
> > ..snippet of apiserver/cluster..
> > spec:
> >   servingCerts:
> >     namedCertificates:
> >     - names:
> >       - api.int-3.online-starter.openshift.com
> >       servingCertificate:
> >         name: api-certs
> > 
> > I created secrets/api-certs in openshift-config. The certificate is used for
> > both the router and the master and signed for two wildcards:
> > 'DNS:*.apps.int-3.online-starter.openshift.com,DNS:*.int-3.online-starter.
> > openshift.com'.
> 
> I get that users will need to update the certificate used by the router as
> that servers applications. Do you have a reason why you want to update the
> api.cluster_domain serving certificate. that is not user facing in terms of
> applications and only used by the k8s client that already require the
> certificate-authority to trust the api server. _ just looking for more
> information on the motivation_
> 
> previously 3.x was served through api endpoint, but that has also moved to
> its own route.

EDIT:  previously in 3.x console was served through the api endpoint, but that has also moved to its own route.

> > 
> > Version-Release number of selected component (if applicable):
> > version   4.0.0-0.alpha-2019-03-04-160136 
> > 
> > Expected results:
> > kube-apiserver should not be failing and api.<cluster> should start serving
> > using this certificate. 
> > 
> > Additional info:
> > See listings in attachments.

Comment 17 Eric Paris 2019-04-10 23:58:12 UTC
the oc client is not the only thing that talks to the apiserver. When you go to the console the console gives you the html and javascript and css, but the actual data shown comes because your browser talks directly to the apiserver. Customers write their own code that talk to the api server. Thing every single customer ever who uses their own Jenkins instance to deploy containers to OpenShift. That code they write may not be "oc". Lots of things other than oc talk to the apiserver.

And those things expect that apiserver to have a real trusted certificate. For example, you shouldn't have to click accept on exceptions just to get the console run. You should be able to sign, using either a public CA or your corp CA a certificate, and have the api server use a key that is trusted outside of the cluster.

Comment 20 Seth Jennings 2019-04-15 19:28:16 UTC
Sending to Master.

While the kubelet is affected by this, there is no change need to the kubelet or anything managed by the Pod team.   I've been mostly a middle man on this bug.  Transferring it do David who is doing almost all the work.

Comment 21 David Eads 2019-04-16 12:37:56 UTC
I think all the master team items are completed.  The remaining item is updating the kubelet configuration in the installer, which I think the pod team owns.

Comment 23 Seth Jennings 2019-04-16 20:39:04 UTC
should be the last part for this:
https://github.com/openshift/installer/pull/1633

Comment 24 Seth Jennings 2019-04-22 14:02:51 UTC
tests are green on the PR.  just waiting for beta4 to be cut before merging this.

Comment 29 errata-xmlrpc 2019-06-04 10:45:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758