Bug 1724189 - Nodes in Not Ready state after adding API named certificate as per https://docs.openshift.com/container-platform/4.1/authentication/certificates/api-server.html
Summary: Nodes in Not Ready state after adding API named certificate as per https://do...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.2.0
Assignee: David Eads
QA Contact: Xingxing Xia
Depends On:
Blocks: 1728877
TreeView+ depends on / blocked
Reported: 2019-06-26 12:32 UTC by Miheer Salunke
Modified: 2019-10-16 06:32 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-10-16 06:32:41 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:32:53 UTC

Comment 4 Luis Sanchez 2019-06-28 12:54:34 UTC
I need more details.
At minimum cert details (e.g. openssl x509 -in certificate.crt -text -noout) and the contents of apiserver/cluster (oc describe apiserver cluster).

Comment 5 Luis Sanchez 2019-06-28 14:32:30 UTC
Please provide the output of must-gather.

Using the openshift-must-gather binary:

 openshift-must-gather inspect clusteroperator/kube-apiserver

OR using the must-gather image, e.g:

  $ export KUBECONFIG=/path/to/kubeconfig
  $ export image=quay.io/openshift/origin-must-gather:latest # for example
  $ output_dir="${PWD}/must-gather.$(date --utc +%Y%m%d_%H%M%SZ)" 
  $ mkdir -p ${output_dir}
  $ docker run --rm --interactive --tty \
    --volume=${KUBECONFIG}:/root/.kube/config:z \
    --volume=${output_dir}:/must-gather:z \
    --workdir=/ \
    ${image} \
    openshift-must-gather inspect clusteroperator/kube-apiserver

Comment 11 David Eads 2019-07-08 11:58:07 UTC
The node team can help get your kubelets honoring both old and new serving certs so that you can run pods to collect your data.

Node team: see comment 6 to see what's happened https://bugzilla.redhat.com/show_bug.cgi?id=1724189#c6 .  We have delivered code to stop people from doing this in the future, but the master kubelets needs to trust the current serving cert they set in order to read pods in order to rollout a new revision.

Comment 14 Ryan Phillips 2019-07-23 22:02:22 UTC
Restoring the control plane should be possible via the documented steps in [1]. Have these steps been attempted?

1. https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-3-expired-certs.html

Comment 15 Miheer Salunke 2019-07-25 03:38:58 UTC
(In reply to Ryan Phillips from comment #14)
> Restoring the control plane should be possible via the documented steps in
> [1]. Have these steps been attempted?
> 1.
> https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-
> 3-expired-certs.html

No I don't think so. But is it needed in this case ?  The certs don't seem to be expired.

I noticed https://docs.openshift.com/container-platform/4.1/authentication/certificates/api-server.html was updated with:
Do not provide a named certificate for the internal load balancer (host name api-int.<cluster_name>.<base_domain>). Doing so will leave your cluster in a degraded state.

If this means API certificate can be deployed only with external api hostname and no api-int SAN, it would be an acceptable solution for our use case.

Comment 20 Greg Blomquist 2019-08-26 19:08:15 UTC
Cloned (copied actually) to 4.1.z: https://bugzilla.redhat.com/show_bug.cgi?id=1728877

https://github.com/openshift/origin/pull/23297 is merged in Origin master.  Moving to modified for 4.2.0.

Comment 27 Xingxing Xia 2019-09-16 03:22:43 UTC
Chuan, can use root CA from http://file.rdu.redhat.com/~xxia/rootCA/ , or create your own root CA using https://github.com/giantswarm/grumpy/blob/instance_migration/gen_certs.sh#L8-L11 :
openssl genrsa -out certs/ca.key 2048
openssl req -new -x509 -key certs/ca.key -out certs/ca.crt -config certs/ca_config.txt

Comment 31 errata-xmlrpc 2019-10-16 06:32:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.