1709213 – [DR][MSTR-363] Can't do recovery on the master restarted by the machine-set

Bug 1709213 - [DR][MSTR-363] Can't do recovery on the master restarted by the machine-set

Summary: [DR][MSTR-363] Can't do recovery on the master restarted by the machine-set

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Tomáš Nožička
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-13 08:13 UTC by zhou ying
Modified:	2019-10-16 06:28 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:28:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:28:57 UTC

Description zhou ying 2019-05-13 08:13:57 UTC

Description of problem:
Can't do the recovery master which restarted by the machine-set

Version-Release number of selected component (if applicable):
[zhouying@dhcp-140-138 ~]$ oc version --short
Payload: 4.1.0-0.nightly-2019-05-09-204138

How reproducible:
Always

Steps to Reproduce:
1. Fllow the Doc: https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit to force reotate certificate for all the cluster;
2. On the AWS web-console stop the masters instance one by one;
3. The machine-set will restart the master instance.
4. Do the recovery steps for the restart master

Actual results:
4. Can't do the recovery step on master which restarted by the machine-set:
podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create
failed to create recovery apiserver: failed to read kube-apiserver pod manifest at "/etc/kubernetes/manifests/kube-apiserver-pod.yaml": failed to open file "/etc/kubernetes/manifests/kube-apiserver-pod.yaml": open /etc/kubernetes/manifests/kube-apiserver-pod.yaml: no such file or directory

Expected results:
4. Should do recovery succeed.

Additional info:

Comment 1 Tomáš Nožička 2019-05-13 09:01:25 UTC

why don't you stop all the masters at once? that should avoid interactions with machineset controller running on other masters

Comment 4 Tomáš Nožička 2019-05-13 14:31:01 UTC

You can't recover certs on a node that hasn't been installed yet. There are no component for which to fix certs.


> 2. On the AWS web-console stop the masters instance one by one;

I have clarified the doc to explicitly state to stop all the masters at once to avoid interacting with MCO and creation of 4th master.

Comment 9 errata-xmlrpc 2019-10-16 06:28:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.