Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1709213

Summary: [DR][MSTR-363] Can't do recovery on the master restarted by the machine-set
Product: OpenShift Container Platform Reporter: zhou ying <yinzhou>
Component: MasterAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: aos-bugs, jokerman, mfojtik, mmccomas, tnozicka, xxia
Target Milestone: ---Keywords: Reopened
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:28:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description zhou ying 2019-05-13 08:13:57 UTC
Description of problem:
Can't do the recovery master which restarted by the machine-set

Version-Release number of selected component (if applicable):
[zhouying@dhcp-140-138 ~]$ oc version --short
Payload: 4.1.0-0.nightly-2019-05-09-204138

How reproducible:
Always

Steps to Reproduce:
1. Fllow the Doc: https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit to force reotate certificate for all the cluster;
2. On the AWS web-console stop the masters instance one by one;
3. The machine-set will restart the master instance.
4. Do the recovery steps for the restart master

Actual results:
4. Can't do the recovery step on master which restarted by the machine-set:
podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create
failed to create recovery apiserver: failed to read kube-apiserver pod manifest at "/etc/kubernetes/manifests/kube-apiserver-pod.yaml": failed to open file "/etc/kubernetes/manifests/kube-apiserver-pod.yaml": open /etc/kubernetes/manifests/kube-apiserver-pod.yaml: no such file or directory

Expected results:
4. Should do recovery succeed.

Additional info:

Comment 1 Tomáš Nožička 2019-05-13 09:01:25 UTC
why don't you stop all the masters at once? that should avoid interactions with machineset controller running on other masters

Comment 4 Tomáš Nožička 2019-05-13 14:31:01 UTC
You can't recover certs on a node that hasn't been installed yet. There are no component for which to fix certs.


> 2. On the AWS web-console stop the masters instance one by one;

I have clarified the doc to explicitly state to stop all the masters at once to avoid interacting with MCO and creation of 4th master.

Comment 9 errata-xmlrpc 2019-10-16 06:28:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922