Bug 1709213 - [DR][MSTR-363] Can't do recovery on the master restarted by the machine-set
Summary: [DR][MSTR-363] Can't do recovery on the master restarted by the machine-set
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Tomáš Nožička
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-13 08:13 UTC by zhou ying
Modified: 2019-10-16 06:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:28:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:28:57 UTC

Description zhou ying 2019-05-13 08:13:57 UTC
Description of problem:
Can't do the recovery master which restarted by the machine-set

Version-Release number of selected component (if applicable):
[zhouying@dhcp-140-138 ~]$ oc version --short
Payload: 4.1.0-0.nightly-2019-05-09-204138

How reproducible:
Always

Steps to Reproduce:
1. Fllow the Doc: https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit to force reotate certificate for all the cluster;
2. On the AWS web-console stop the masters instance one by one;
3. The machine-set will restart the master instance.
4. Do the recovery steps for the restart master

Actual results:
4. Can't do the recovery step on master which restarted by the machine-set:
podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create
failed to create recovery apiserver: failed to read kube-apiserver pod manifest at "/etc/kubernetes/manifests/kube-apiserver-pod.yaml": failed to open file "/etc/kubernetes/manifests/kube-apiserver-pod.yaml": open /etc/kubernetes/manifests/kube-apiserver-pod.yaml: no such file or directory

Expected results:
4. Should do recovery succeed.

Additional info:

Comment 1 Tomáš Nožička 2019-05-13 09:01:25 UTC
why don't you stop all the masters at once? that should avoid interactions with machineset controller running on other masters

Comment 4 Tomáš Nožička 2019-05-13 14:31:01 UTC
You can't recover certs on a node that hasn't been installed yet. There are no component for which to fix certs.


> 2. On the AWS web-console stop the masters instance one by one;

I have clarified the doc to explicitly state to stop all the masters at once to avoid interacting with MCO and creation of 4th master.

Comment 9 errata-xmlrpc 2019-10-16 06:28:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.