Bug 1732377

Summary:	Can't create the recovery API server with 4.2 env
Product:	OpenShift Container Platform	Reporter:	zhou ying <yinzhou>
Component:	kube-apiserver	Assignee:	Tomáš Nožička <tnozicka>
Status:	CLOSED ERRATA	QA Contact:	zhou ying <yinzhou>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	4.2.0	CC:	aos-bugs, mfojtik, schoudha, sttts, tnozicka
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-16 06:30:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description zhou ying 2019-07-23 09:14:43 UTC

Description of problem:
With 4.2 env  by command: `podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create`, the recovery API server created failed.


Version-Release number of selected component (if applicable):
Payload: 4.2.0-0.nightly-2019-07-21-222447 or later


How reproducible:
always

Steps to Reproduce:
1. Follow the doc: https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-3-expired-certs.html to do certificate recovery;
2. Create the recovery API server by:
    `podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create`

Actual results:
2.  Check the recovery API server failed and no 7443 port listenning: 
[root@ip-10-0-137-57 ~]# export KUBECONFIG=/etc/kubernetes/static-pod-resources/recovery-kube-apiserver-pod/admin.kubeconfig
[root@ip-10-0-137-57 ~]# oc get node
The connection to the server localhost:7443 was refused - did you specify the right host or port?
[root@ip-10-0-137-57 ~]# until oc get namespace kube-system 2>/dev/null 1>&2; do echo 'Waiting for recovery apiserver to come up.'; sleep 1; done
Waiting for recovery apiserver to come up.
Waiting for recovery apiserver to come up.
......

[root@ip-10-0-137-57 ~]# netstat -pan |grep 7443

Expected results:
2. Should succeed.

Additional info:

Comment 3 zhou ying 2019-08-07 01:04:48 UTC

The related jira card still in progress, is this issue fixed ?

Comment 4 Tomáš Nožička 2019-08-07 10:51:40 UTC

Fixed.

(The Jira card is about having e2e so it doesn't happen next time.)

Comment 5 zhou ying 2019-08-08 02:47:34 UTC

Hi Tomáš:

   confirmed with payload: 4.2.0-0.nightly-2019-08-08-002434, now the kube-apiserver-recovery pod can be running, but follow the doc after restart the kubelet ,all the node will be "NotReady" and can't recovery:

[root@ip-10-0-141-200 ~]# oc get node
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-141-200.us-east-2.compute.internal   NotReady   master   25h   v1.14.0+f390ff124
ip-10-0-141-33.us-east-2.compute.internal    NotReady   worker   25h   v1.14.0+f390ff124
ip-10-0-152-108.us-east-2.compute.internal   NotReady   master   25h   v1.14.0+f390ff124
ip-10-0-152-180.us-east-2.compute.internal   NotReady   worker   25h   v1.14.0+f390ff124
ip-10-0-169-234.us-east-2.compute.internal   NotReady   master   25h   v1.14.0+f390ff124
[root@ip-10-0-141-200 ~]# oc get csr
No resources found.

Comment 6 zhou ying 2019-08-08 08:23:10 UTC

Confirmed with Payload: 4.2.0-0.nightly-2019-08-08-002434, the  recovery apiserver started and become ready:
[root@ip-10-0-142-67 kubernetes]# oc get po -n openshift-kube-apiserver
NAME                                                                READY   STATUS      RESTARTS   AGE
.....
kube-apiserver-recovery-ip-10-0-142-67.us-east-2.compute.internal   1/1     Running     0          15m

Comment 7 errata-xmlrpc 2019-10-16 06:30:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922