Bug 1349144
Summary: | Router provides an invalid, expired certificate by default | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Matt Wringe <mwringe> | |
Component: | Networking | Assignee: | Phil Cameron <pcameron> | |
Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | aos-bugs, bbennett, bmeng, ccoleman, eparis, erjones, jkaur, tdawson, vlaad | |
Version: | 3.1.0 | Keywords: | Reopened | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1410757 (view as bug list) | Environment: | ||
Last Closed: | 2016-09-27 09:38:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1410757 |
Description
Matt Wringe
2016-06-22 19:15:05 UTC
Going forward, we should be generating a cert signed by the openshift CA and injecting it into the router image as a secret. If we aren't, then we should. For the older versions, we should just make sure whatever cert we are shipping is regenerated at a newer version. @Ben, How can we correct this in an existing environment where this is causing a problem? It appears that the committed router certificate is expired. On netdev22 (running OSE 3.2.0.44) I ran: # openssl s_client -showcerts -connect 172.30.113.75:443 -cert default_pub_keys.pem CONNECTED(00000003) depth=0 C = --, ST = SomeState, L = SomeCity, O = SomeOrganization, OU = SomeOrganizationalUnit, CN = ip-10-35-63-192, emailAddress = root@ip-10-35-63-192 verify error:num=18:self signed certificate verify return:1 depth=0 C = --, ST = SomeState, L = SomeCity, O = SomeOrganization, OU = SomeOrganizationalUnit, CN = ip-10-35-63-192, emailAddress = root@ip-10-35-63-192 verify error:num=10:certificate has expired notAfter=Feb 24 05:56:46 2015 GMT verify return:1 depth=0 C = --, ST = SomeState, L = SomeCity, O = SomeOrganization, OU = SomeOrganizationalUnit, CN = ip-10-35-63-192, emailAddress = root@ip-10-35-63-192 notAfter=Feb 24 05:56:46 2015 GMT verify return:1 Where default_pub_keys.pem is from the cloned openshift/origin: images/router/haproxy/conf/default_pub_keys.pem The file was committed here: $ git log conf/default_pub_keys.pem commit b9f27c30b5b90a80cd267b7d9d558a778b56bd98 Author: Clayton Coleman <ccoleman> Date: Sat Feb 13 17:56:04 2016 -0500 Simplify the router and ipfailover images Squash down to use origin as the base image instead of their originals. So it appears that an expired certificate has been committed. Next up is finding out where the file came from? Maybe Clayton can help. Generated new cert. openshift/origin PR9648 (In reply to Eric Jones from comment #4) > @Ben, > > How can we correct this in an existing environment where this is causing a > problem? Really, you shouldn't be using the default default image from the image. It is advisable to provide the cert with the --default-cert arg to the router as described at https://docs.openshift.com/enterprise/3.2/install_config/install/deploy_router.html#using-wildcard-certificates Do we want to close this as WONTFIX, NOTABUG, ...? If not, what needs to be done? This has been stalled for a while now. There are two openshift PRs for this at present 9648 - the original (I made a mistake and pushed the branch to openshift) and PR9719 (based on my fork). Each has some discussion on this. The drift seems to be don't supply any default certificates. For my sanity, the PRs are: https://github.com/openshift/origin/pull/9719 (wrong branch, good discussion) https://github.com/openshift/origin/pull/9648 (right branch, no discussion) This has been hanging for a long time. Who needs to be involved in the decision? If we want a cert, merge PR9719, otherwise lets figure out what to do. Please comment on this if you want to be in the conversation. We are still discussing how to handle this. We have come to a decision on what to do: For old releases (pre 3.1) we do nothing. Just point uses to existing docs on how to work with certs. For 3.1/3.2/newer releases the admin can generate and manage the certs. 1. To create the cert: oadm ca create-server-cert --cert=router.crt --key=router.key --hostnames=default.router.hostname [ ... openshift CA args ... ] cat router.crt router.key [../path/to/openshift.ca] > router.pem 2a. To add to a new router: oadm router --default-cert=router.pem ... 2b. To add to an existing router: oc create secret tls router-certs --cert=router.pem --key=router.key oc volumes dc/router --add --secret-name=router-certs --mount-path=/etc/pki/tls/private --name=server-certificate oc set env dc/router DEFAULT_CERTIFICATE_PATH=/etc/pki/tls/private/tls.crt Decided to support comment #13 above. There are some changes to the instructions: Make the router cert # mkdir cert ; cd cert # CA=/etc/origin/master # oadm ca create-server-cert --signer-cert=$CA/ca.crt --signer-key=$CA/ca.key --signer-serial=$CA/ca.serial.txt --hostnames=default.router.host --cert=router.crt --key=router.key # cat router.crt router.key $CA/ca.crt > router.pem Create a secret # oc create secret generic router-certs --from-file=router.pem # oc get secret router-certs -o yaml When deploying a new router use 2.a in comment #13 Modify the router to access the secret: # oc volumes dc/router --add --secret-name=router-certs --mount-path=/etc/pki/tls/private --name=server-certificate # oc set env dc/router DEFAULT_CERTIFICATE_PATH=/etc/pki/tls/private/router.pem The above modifies the dc/router: # oc edit dc/router . . . - name: DEFAULT_CERTIFICATE_PATH value: /etc/pki/tls/private/router.pem . . . volumeMounts: - mountPath: /etc/pki/tls/private name: server-certificate . . . volumes: - name: server-certificate secret: secretName: router-certs The haproxy-config.template is modified to reference DEFAULT_CERTIFICATE_PATH for certificate selection. The template changes, when ready, will be pushed to openshift/origin. oadm changes to create the default router cert, when ready, will be pushed to openshift/origin. Created origin PR10345 The current shipped default cert (pem) is deleted from the build. In its place "oadm router ..." always provides a default cert. The administrator can provide the default cert via the --default-cert <cert>.pem option. When this is not done an annotation is added to the router service that will generate a secret containing the default cert. As with the provided default cert a voulme and volume mount are added to the router container. When the container starts up the default cert is checked. If it doesn't exist one is created from the secret's tls.crt and tls.key file. The haproxy template always references {{.DefaultCertificate}} Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/3d0976340c9bd1a32b894d86042696057745db90 Router provides an invalid, expired certificate We previously shipped a default certificate to be used if the admin did not specify their own when deploying the router. Over time the certificate expired and we continued to ship the expired certificate. This fix updates the expired cert. The replacement cert permits routers deployed in v3.2 or earlier to continue to work after an upgrade. Because the default cert is in the router image, the same certificate could end up being used across multiple different deployments. In v3.3 and beyond, when a router is deployed it ends up with its own default certificate whether or not one is supplied by the admin. When the admin does not provide the default cert the automated certificate generation for services is used to generate a secret containing a default certificate. Unlike the user-passed default certificate (which must include the concatenated private key PEM format), this secret contains a key and a crt as separate files. The router combines them before use. Fixes bug 1349144 Signed-off-by: Phil Cameron <pcameron> Tested with latest OSE build v3.3.0.24 The router cannot be created without --default-cert specified. Since the secret <router-name>-certs cannot be created automatically, and it cannot be mounted to the pod. Found logs from node: Aug 23 04:01:16 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: E0823 04:01:16.251947 2915 kubelet.go:1945] Unable to mount volumes for pod "router-1-jgouh_default(408be495-6906-11e6-956b-0e7a4bcc54df)": timeout expired waiting for volumes to attach/mount for pod "router-1-jgouh"/"default". list of unattached/unmounted volumes=[server-certificate]; skipping pod Aug 23 04:01:16 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: E0823 04:01:16.251973 2915 pod_workers.go:183] Error syncing pod 408be495-6906-11e6-956b-0e7a4bcc54df, skipping: timeout expired waiting for volumes to attach/mount for pod "router-1-jgouh"/"default". list of unattached/unmounted volumes=[server-certificate] Aug 23 04:01:16 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:16.252377 2915 server.go:655] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-jgouh", UID:"408be495-6906-11e6-956b-0e7a4bcc54df", APIVersion:"v1", ResourceVersion:"8031", FieldPath:""}): type: 'Warning' reason: 'FailedSync' Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "router-1-jgouh"/"default". list of unattached/unmounted volumes=[server-certificate] Aug 23 04:01:16 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:16.252414 2915 server.go:655] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-jgouh", UID:"408be495-6906-11e6-956b-0e7a4bcc54df", APIVersion:"v1", ResourceVersion:"8031", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to mount volumes for pod "router-1-jgouh_default(408be495-6906-11e6-956b-0e7a4bcc54df)": timeout expired waiting for volumes to attach/mount for pod "router-1-jgouh"/"default". list of unattached/unmounted volumes=[server-certificate] Aug 23 04:01:17 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:17.254963 2915 generic.go:181] GenericPLEG: Relisting Aug 23 04:01:17 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:17.256544 2915 kubelet.go:3551] Generating status for "router-1-jgouh_default(408be495-6906-11e6-956b-0e7a4bcc54df)" Aug 23 04:01:17 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:17.256574 2915 kubelet.go:3516] pod waiting > 0, pending But there will not have the problem if specify the default cert when creating. This may be a configuration problem. Please verify that in /etc/origin/master/master-config.yaml after the controllers: '*' line make sure the following lines are in the file: controllerConfig: serviceServingCert: signer: certFile: service-signer.crt keyFile: service-signer.key The service-signer.crt and service-signer.key are in: /etc/origin/master/ So... it turns out the feature this relies on is not enabled in Enterprise by default. The decision from Jordan and Clayton was to enable it. It's being tracked in https://github.com/openshift/openshift-ansible/issues/2345 To work around it, do as Phil suggested above. To generate the crt and key, cd to /etc/origin/master (or wherever your master-config.yaml is) and run: oadm ca create-signer-cert --cert=service-signer.crt --key=service-signer.key --serial=service-signer-serial.txt *** Bug 1369573 has been marked as a duplicate of this bug. *** Meng, There is really nothing new here beyond how we come up with the default cert. Before it was built into the router image, now it is created when the router is created. When creating a new router if anything goes wrong with the secrets the router doesn't start. The router not starting is already tested. Beyond that all other existing router tests should work as expected. The problem that you ran into happened due to missing configuration information in the master-config.yaml file. That is being fixed in the upgrade code. Are there any specific tests that you would like to have added to the test suite? Checked on ose build v3.3.0.26 with latest openshift-ansible, the controllerConfig was added to the master config, and the router can be created without problem. And the new created router provides a new generated certificate. Verify the bug. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933 Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/94b7153656c305965932d7bbfcb0ef66af2d3bc2 prometheus annotations dropped from router service oc adm router creates the router service without the prometheus annotations when the default cert is not supplied. This is a regression introduced by PR 10345 (fix for bug 1349144) Fixes bug 1491430 https://github.com/openshift/origin/commit/5584aaddd008d6267da958e86687e8bdb48e3421 Merge pull request #16334 from pecameron/bz1491430 Automatic merge from submit-queue (batch tested with PRs 15725, 16244, 15796, 16328, 16334) prometheus annotations dropped from router service oc adm router creates the router service without the prometheus annotations when the default cert is not supplied. This is a regression introduced by PR 10345 (fix for bug 1349144) Fixes bug 1491430 |