Description of problem: Can't do the recovery master which restarted by the machine-set Version-Release number of selected component (if applicable): [zhouying@dhcp-140-138 ~]$ oc version --short Payload: 4.1.0-0.nightly-2019-05-09-204138 How reproducible: Always Steps to Reproduce: 1. Fllow the Doc: https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit to force reotate certificate for all the cluster; 2. On the AWS web-console stop the masters instance one by one; 3. The machine-set will restart the master instance. 4. Do the recovery steps for the restart master Actual results: 4. Can't do the recovery step on master which restarted by the machine-set: podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create failed to create recovery apiserver: failed to read kube-apiserver pod manifest at "/etc/kubernetes/manifests/kube-apiserver-pod.yaml": failed to open file "/etc/kubernetes/manifests/kube-apiserver-pod.yaml": open /etc/kubernetes/manifests/kube-apiserver-pod.yaml: no such file or directory Expected results: 4. Should do recovery succeed. Additional info:
why don't you stop all the masters at once? that should avoid interactions with machineset controller running on other masters
You can't recover certs on a node that hasn't been installed yet. There are no component for which to fix certs. > 2. On the AWS web-console stop the masters instance one by one; I have clarified the doc to explicitly state to stop all the masters at once to avoid interacting with MCO and creation of 4th master.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922