Bug 1349144

Summary: Router provides an invalid, expired certificate by default
Product: OpenShift Container Platform Reporter: Matt Wringe <mwringe>
Component: NetworkingAssignee: Phil Cameron <pcameron>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, bbennett, bmeng, ccoleman, eparis, erjones, jkaur, tdawson, vlaad
Version: 3.1.0Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1410757 (view as bug list) Environment:
Last Closed: 2016-09-27 09:38:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1410757    

Description Matt Wringe 2016-06-22 19:15:05 UTC
Description of problem:
The router uses an expired, invalid certificate by default. The router should be using a generated certificate signed by the OpenShift CA in the default case.

Version-Release number of selected component (if applicable):
3.1.1 up until the latest origin

How reproducible:
Always

Steps to Reproduce:
1. oc create serviceaccount router -n default
2. oadm policy add-scc-to-user privileged system:serviceaccount:default:router
3. oadm policy add-scc-to-user hostnetwork -z router
4. oadm router router.example.com --replicas=1 --credentials='./openshift.local.config/master/openshift-router.kubeconfig' --service-account=router

5. openssl s_client -showcerts -connect ${router_ip_address}:443

Actual results:

returns that certificates are more than a year expired


Expected results:

self signed, but still valid certificates


Additional info:

Comment 3 Ben Bennett 2016-06-24 19:32:41 UTC
Going forward, we should be generating a cert signed by the openshift CA and injecting it into the router image as a secret.  If we aren't, then we should.

For the older versions, we should just make sure whatever cert we are shipping is regenerated at a newer version.

Comment 4 Eric Jones 2016-06-24 22:03:05 UTC
@Ben,

How can we correct this in an existing environment where this is causing a problem?

Comment 5 Phil Cameron 2016-06-27 19:14:58 UTC
It appears that the committed router certificate is expired.

On netdev22 (running OSE 3.2.0.44) I ran:
# openssl s_client -showcerts -connect 172.30.113.75:443 -cert default_pub_keys.pem
CONNECTED(00000003)
depth=0 C = --, ST = SomeState, L = SomeCity, O = SomeOrganization, OU = SomeOrganizationalUnit, CN = ip-10-35-63-192, emailAddress = root@ip-10-35-63-192
verify error:num=18:self signed certificate
verify return:1
depth=0 C = --, ST = SomeState, L = SomeCity, O = SomeOrganization, OU = SomeOrganizationalUnit, CN = ip-10-35-63-192, emailAddress = root@ip-10-35-63-192
verify error:num=10:certificate has expired
notAfter=Feb 24 05:56:46 2015 GMT
verify return:1
depth=0 C = --, ST = SomeState, L = SomeCity, O = SomeOrganization, OU = SomeOrganizationalUnit, CN = ip-10-35-63-192, emailAddress = root@ip-10-35-63-192
notAfter=Feb 24 05:56:46 2015 GMT
verify return:1

Where default_pub_keys.pem is from the cloned openshift/origin:
images/router/haproxy/conf/default_pub_keys.pem

The file was committed here:
$ git log conf/default_pub_keys.pem 
commit b9f27c30b5b90a80cd267b7d9d558a778b56bd98
Author: Clayton Coleman <ccoleman>
Date:   Sat Feb 13 17:56:04 2016 -0500

    Simplify the router and ipfailover images
    
    Squash down to use origin as the base image instead of their originals.


So it appears that an expired certificate has been committed.

Next up is finding out where the file came from? Maybe Clayton can help.

Comment 6 Phil Cameron 2016-06-30 17:24:01 UTC
Generated new cert.
openshift/origin PR9648

Comment 7 Ben Bennett 2016-06-30 17:57:59 UTC
(In reply to Eric Jones from comment #4)
> @Ben,
> 
> How can we correct this in an existing environment where this is causing a
> problem?

Really, you shouldn't be using the default default image from the image.  It is advisable to provide the cert with the --default-cert arg to the router as described at https://docs.openshift.com/enterprise/3.2/install_config/install/deploy_router.html#using-wildcard-certificates

Comment 8 Phil Cameron 2016-06-30 18:04:41 UTC
https://github.com/openshift/origin/pull/9648

Comment 9 Phil Cameron 2016-07-08 14:38:29 UTC
Do we want to close this as WONTFIX, NOTABUG, ...?
If not, what needs to be done?
This has been stalled for a while now.

There are two openshift PRs for this at present 9648 - the original (I made a mistake and pushed the branch to openshift) and PR9719 (based on my fork). Each has some discussion on this. The drift seems to be don't supply any default certificates.

Comment 10 Ben Bennett 2016-07-13 17:07:37 UTC
For my sanity, the PRs are:
  https://github.com/openshift/origin/pull/9719 (wrong branch, good discussion)
  https://github.com/openshift/origin/pull/9648 (right branch, no discussion)

Comment 11 Phil Cameron 2016-07-14 13:47:35 UTC
This has been hanging for a long time. Who needs to be involved in the decision?

If we want a cert, merge PR9719, otherwise lets figure out what to do.

Please comment on this if you want to be in the conversation.

Comment 12 Phil Cameron 2016-07-27 13:19:57 UTC
We are still discussing how to handle this.

Comment 13 Phil Cameron 2016-07-28 13:27:34 UTC
We have come to a decision on what to do:

For old releases (pre 3.1) we do nothing. Just point uses to existing docs on how to work with certs.

For 3.1/3.2/newer releases the admin can generate and manage the certs.

1. To create the cert:
oadm ca create-server-cert --cert=router.crt --key=router.key --hostnames=default.router.hostname [ ... openshift CA args ... ]
cat router.crt router.key [../path/to/openshift.ca] > router.pem

2a. To add to a new router:
oadm router --default-cert=router.pem ...

2b. To add to an existing router:
oc create secret tls router-certs --cert=router.pem --key=router.key
oc volumes dc/router --add --secret-name=router-certs --mount-path=/etc/pki/tls/private --name=server-certificate
oc set env dc/router DEFAULT_CERTIFICATE_PATH=/etc/pki/tls/private/tls.crt

Comment 14 Phil Cameron 2016-07-29 14:26:29 UTC
Decided to support comment #13 above.

There are some changes to the instructions:

Make the router cert
# mkdir cert ; cd cert
# CA=/etc/origin/master
# oadm ca create-server-cert --signer-cert=$CA/ca.crt --signer-key=$CA/ca.key --signer-serial=$CA/ca.serial.txt --hostnames=default.router.host --cert=router.crt --key=router.key
# cat router.crt router.key $CA/ca.crt > router.pem


Create a secret
# oc create secret generic router-certs --from-file=router.pem
# oc get secret router-certs -o yaml

When deploying a new router use 2.a in comment #13


Modify the router to access the secret:
# oc volumes dc/router --add --secret-name=router-certs --mount-path=/etc/pki/tls/private --name=server-certificate
# oc set env dc/router DEFAULT_CERTIFICATE_PATH=/etc/pki/tls/private/router.pem

The above modifies the dc/router:
# oc edit dc/router
.
.
.
        - name: DEFAULT_CERTIFICATE_PATH
          value: /etc/pki/tls/private/router.pem
.
.
.
        volumeMounts:
        - mountPath: /etc/pki/tls/private
          name: server-certificate
.
.
.
      volumes:
      - name: server-certificate
        secret:
          secretName: router-certs

The haproxy-config.template is modified to reference DEFAULT_CERTIFICATE_PATH for certificate selection. The template changes, when ready, will be pushed to openshift/origin. 

oadm changes to create the default router cert, when ready, will be pushed to openshift/origin.

Comment 15 Phil Cameron 2016-08-12 19:46:11 UTC
Created origin PR10345

The current shipped default cert (pem) is deleted from the build.
In its place "oadm router ..." always provides a default cert.
The administrator can provide the default cert via the
--default-cert <cert>.pem option. When this is not done an
annotation is added to the router service that will generate a
secret containing the default cert. As with the provided default
cert a voulme and volume mount are added to the router container.
When the container starts up the default cert is checked. If it 
doesn't exist one is created from the secret's tls.crt and tls.key
file. The haproxy template always references {{.DefaultCertificate}}

Comment 17 openshift-github-bot 2016-08-19 17:28:33 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/3d0976340c9bd1a32b894d86042696057745db90
Router provides an invalid, expired certificate

We previously shipped a default certificate to be used if the admin
did not specify their own when deploying the router.  Over time the
certificate expired and we continued to ship the expired certificate.
This fix updates the expired cert. The replacement cert permits
routers deployed in v3.2 or earlier to continue to work after an
upgrade.

Because the default cert is in the router image, the same certificate
could end up being used across multiple different deployments. In v3.3
and beyond, when a router is deployed it ends up with its own default
certificate whether or not one is supplied by the admin.

When the admin does not provide the default cert the automated certificate
generation for services is used to generate a secret containing a default
certificate.  Unlike the user-passed default certificate (which must
include the concatenated private key PEM format), this secret contains
a key and a crt as separate files. The router combines them before use.

Fixes bug 1349144

Signed-off-by: Phil Cameron <pcameron>

Comment 18 Meng Bo 2016-08-23 08:08:09 UTC
Tested with latest OSE build v3.3.0.24

The router cannot be created without --default-cert specified.
Since the secret <router-name>-certs cannot be created automatically, and it cannot be mounted to the pod.

Found logs from node:
Aug 23 04:01:16 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: E0823 04:01:16.251947    2915 kubelet.go:1945] Unable to mount volumes for pod "router-1-jgouh_default(408be495-6906-11e6-956b-0e7a4bcc54df)": timeout expired waiting for volumes to attach/mount for pod "router-1-jgouh"/"default". list of unattached/unmounted volumes=[server-certificate]; skipping pod
Aug 23 04:01:16 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: E0823 04:01:16.251973    2915 pod_workers.go:183] Error syncing pod 408be495-6906-11e6-956b-0e7a4bcc54df, skipping: timeout expired waiting for volumes to attach/mount for pod "router-1-jgouh"/"default". list of unattached/unmounted volumes=[server-certificate]
Aug 23 04:01:16 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:16.252377    2915 server.go:655] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-jgouh", UID:"408be495-6906-11e6-956b-0e7a4bcc54df", APIVersion:"v1", ResourceVersion:"8031", FieldPath:""}): type: 'Warning' reason: 'FailedSync' Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "router-1-jgouh"/"default". list of unattached/unmounted volumes=[server-certificate]
Aug 23 04:01:16 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:16.252414    2915 server.go:655] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-jgouh", UID:"408be495-6906-11e6-956b-0e7a4bcc54df", APIVersion:"v1", ResourceVersion:"8031", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to mount volumes for pod "router-1-jgouh_default(408be495-6906-11e6-956b-0e7a4bcc54df)": timeout expired waiting for volumes to attach/mount for pod "router-1-jgouh"/"default". list of unattached/unmounted volumes=[server-certificate]
Aug 23 04:01:17 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:17.254963    2915 generic.go:181] GenericPLEG: Relisting
Aug 23 04:01:17 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:17.256544    2915 kubelet.go:3551] Generating status for "router-1-jgouh_default(408be495-6906-11e6-956b-0e7a4bcc54df)"
Aug 23 04:01:17 ip-172-18-3-67.ec2.internal atomic-openshift-node[2915]: I0823 04:01:17.256574    2915 kubelet.go:3516] pod waiting > 0, pending



But there will not have the problem if specify the default cert when creating.

Comment 19 Phil Cameron 2016-08-23 14:52:29 UTC
This may be a configuration problem. Please verify that in
/etc/origin/master/master-config.yaml
after the
controllers: '*'
line make sure the following lines are in the file:
controllerConfig:
  serviceServingCert:
    signer:
      certFile: service-signer.crt
      keyFile: service-signer.key

The service-signer.crt and service-signer.key are in:
/etc/origin/master/

Comment 20 Ben Bennett 2016-08-23 14:58:45 UTC
So... it turns out the feature this relies on is not enabled in Enterprise by default.  The decision from Jordan and Clayton was to enable it.  It's being tracked in https://github.com/openshift/openshift-ansible/issues/2345

To work around it, do as Phil suggested above.  To generate the crt and key, cd to /etc/origin/master (or wherever your master-config.yaml is) and run:
  oadm ca create-signer-cert --cert=service-signer.crt --key=service-signer.key --serial=service-signer-serial.txt

Comment 21 Ben Bennett 2016-08-23 21:55:20 UTC
*** Bug 1369573 has been marked as a duplicate of this bug. ***

Comment 22 Phil Cameron 2016-08-25 12:58:08 UTC
Meng, There is really nothing new here beyond how we come up with the default cert. Before it was built into the router image, now it is created when the router is created. When creating a new router if anything goes wrong with the secrets the router doesn't start. The router not starting is already tested. Beyond that all other existing router tests should work as expected.

The problem that you ran into happened due to missing configuration information in the master-config.yaml file. That is being fixed in the upgrade code.

Are there any specific tests that you would like to have added to the test suite?

Comment 25 Meng Bo 2016-08-29 06:37:24 UTC
Checked on ose build v3.3.0.26 with latest openshift-ansible, the controllerConfig was added to the master config, and the router can be created without problem.

And the new created router provides a new generated certificate.

Verify the bug.

Comment 27 errata-xmlrpc 2016-09-27 09:38:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Comment 28 openshift-github-bot 2017-09-15 20:37:55 UTC
Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/94b7153656c305965932d7bbfcb0ef66af2d3bc2
prometheus annotations dropped from router service

oc adm router
creates the router service without the prometheus annotations when
the default cert is not supplied. This is a regression introduced
by PR 10345 (fix for bug 1349144)

Fixes bug 1491430

https://github.com/openshift/origin/commit/5584aaddd008d6267da958e86687e8bdb48e3421
Merge pull request #16334 from pecameron/bz1491430

Automatic merge from submit-queue (batch tested with PRs 15725, 16244, 15796, 16328, 16334)

prometheus annotations dropped from router service

oc adm router
creates the router service without the prometheus annotations when
the default cert is not supplied. This is a regression introduced
by PR 10345 (fix for bug 1349144)

Fixes bug 1491430