During startup a kubelet...
1. tries to use current client-cert. if it is missing or invalid it...
2. uses a bootstrap credential to make a CSR for a new client-cert
This flow happens on initial startup and when clusters are restarted after being off for an extended period. On the masters, step 2 fails after one day.
Master kubelets do not use the same bootstrap credentials as the rest of the cluster. Because it's initially a client-cert on the masters, we cannot extend the lifetime indefinitely because client-certs are not individually revokable.
The master kubelets should be updated sometime after the initial boot to use the same serviceaccount token that the rest of the nodes use. Now that the MCO doesn't reboot a machine for every update, this should work.
To my knowledge, this is the only reason that clusters cannot be shutdown shortly after installation. It's also the only reason that step 9 exists here: https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-3-expired-certs.html
> Now that the MCO doesn't reboot a machine for every update, this should work.
This would be news to me if the MCD does this now.
If it does not, then we are looking at two reboots per node during install: one for the original pivot and one to apply the changed MC that includes the new bootstrap credentials.
I misunderstood how the kubelet ca updates were being handled. If they are rebooting all the machines, I guess you face a similar choice here.
Regardless, this is the only thing I'm aware of that prevents an immediate shutdown of a cluster after installation.
We really need the MCD to be more feature-rich to make this work. In particular, we need to be able to reproject files changed in the MC without a reboot. Rebooting the nodes twice during install is a disruptive change.
For this reason, I'm deferring to 4.3. I've talked to Antonio and this functionality is a priority for MCO in 4.3. I'll reference a Jira story tracking the progress when one exists.
Is this the root cause for: https://bugzilla.redhat.com/show_bug.cgi?id=1693951