Description of problem: Providing a non-wild card certificate to a wildcard route, causes the cluster to crash. Version-Release number of selected component (if applicable): 4.1 Beta 3 How reproducible: 100% Steps to Reproduce: 1: Create some certificates (mimics a customer getting their own certificates) DOMAIN=$(oc get ingresscontroller.operator default -n openshift-ingress-operator -o jsonpath='{.status.domain}{"\n"}' | sed 's/apps/gears/'); openssl req -new -newkey rsa:2048 -days 5 -nodes -x509 -subj "/C=US/ST=NC/L=Raleigh/O=Red Hat Inc./OU=CEE/CN=$DOMAIN" -keyout server.key -out server.crt 2: Put the certificate into a secret oc create secret tls broken-cert --cert=server.crt --key=server.key -n openshift-ingress 3: Tell the cluster to use this certificate as the default certificate for all wildcard domains, note my certificate is not a wild card certificate. oc patch ingresscontroller.operator default --type=merge -p "{\"spec\":{\"defaultCertificate\":{\"name\": \"broken-cert\"}}}" -n openshift-ingress-operator Actual results: This breaks a cluster Expected results: The ingress operator validates that the cert you pointed to is a wildcard before using it, and log an error if not. Additional info:
Moving to routing as they maintain ingress operator.
I'm not sure I agree with characterization of this bug. The claims I see are that providing an "invalid" certificate: 1. "causes the cluster to crash." 2. "This breaks a cluster" Can you more clearly describe what you mean by "crash" and "break"? I've executed your steps and the ingresscontroller happily uses the provided certificate. The ingresscontroller API doesn't describe any validation rules applied to the contents of the certificate secret[1]. Whether or not some validation _should_ be applied seems like a useful discussion, but can we agree that such a discussion should take place in the context of an RFE and not a bug report? [1] https://github.com/openshift/api/blob/master/operator/v1/types_ingress.go#L89
(In reply to Dan Mace from comment #2) > I'm not sure I agree with characterization of this bug. The claims I see are > that providing an "invalid" certificate: > > 1. "causes the cluster to crash." > 2. "This breaks a cluster" > > Can you more clearly describe what you mean by "crash" and "break"? I've > executed your steps and the ingresscontroller happily uses the provided > certificate. > > The ingresscontroller API doesn't describe any validation rules applied to > the contents of the certificate secret[1]. Whether or not some validation > _should_ be applied seems like a useful discussion, but can we agree that > such a discussion should take place in the context of an RFE and not a bug > report? > > [1] > https://github.com/openshift/api/blob/master/operator/v1/types_ingress.go#L89 Router no longer runs. Without a router authentication no longer works.
(In reply to Eric Rich from comment #3) > (In reply to Dan Mace from comment #2) > > I'm not sure I agree with characterization of this bug. The claims I see are > > that providing an "invalid" certificate: > > > > 1. "causes the cluster to crash." > > 2. "This breaks a cluster" > > > > Can you more clearly describe what you mean by "crash" and "break"? I've > > executed your steps and the ingresscontroller happily uses the provided > > certificate. > > > > The ingresscontroller API doesn't describe any validation rules applied to > > the contents of the certificate secret[1]. Whether or not some validation > > _should_ be applied seems like a useful discussion, but can we agree that > > such a discussion should take place in the context of an RFE and not a bug > > report? > > > > [1] > > https://github.com/openshift/api/blob/master/operator/v1/types_ingress.go#L89 > > Router no longer runs. > Without a router authentication no longer works. I didn't see crash loops when running the test. I am seeing the invalid cert being served. If there's actually a router crash occurring, I would expect status of the ingresscontroller/operator to reflect that. Can you provide the output of must-gather following a crash if that's what you're seeing? Given a reference to a well-formed certificate data in a valid secret reference I would expect the certificate to be used and served as-is, and so far what I see matches those expectations. I do agree that changing the certificate has potentially serious downstream implications, and I think our API docs should more clearly articulate the risks. We can certainly improve the API docs right away. That said, my claim is still that we're conforming to the API we publish. I'll let the rest of the team weigh in here, but my current position is that we need to spec out the behavior in an RFE or story.
Can't reproduce a crash, and some sort of validation of the certificate would be an RFE. I'm going to close this one. If I've misunderstood and you have some steps to reproduce a crash, please let me know and re-open.
https://jira.coreos.com/browse/RFE-298 filed for this.