Bug 1732377

Summary: Can't create the recovery API server with 4.2 env
Product: OpenShift Container Platform Reporter: zhou ying <yinzhou>
Component: kube-apiserverAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.2.0CC: aos-bugs, mfojtik, schoudha, sttts, tnozicka
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:30:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description zhou ying 2019-07-23 09:14:43 UTC
Description of problem:
With 4.2 env  by command: `podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create`, the recovery API server created failed.


Version-Release number of selected component (if applicable):
Payload: 4.2.0-0.nightly-2019-07-21-222447 or later


How reproducible:
always

Steps to Reproduce:
1. Follow the doc: https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-3-expired-certs.html to do certificate recovery;
2. Create the recovery API server by:
    `podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create`

Actual results:
2.  Check the recovery API server failed and no 7443 port listenning: 
[root@ip-10-0-137-57 ~]# export KUBECONFIG=/etc/kubernetes/static-pod-resources/recovery-kube-apiserver-pod/admin.kubeconfig
[root@ip-10-0-137-57 ~]# oc get node
The connection to the server localhost:7443 was refused - did you specify the right host or port?
[root@ip-10-0-137-57 ~]# until oc get namespace kube-system 2>/dev/null 1>&2; do echo 'Waiting for recovery apiserver to come up.'; sleep 1; done
Waiting for recovery apiserver to come up.
Waiting for recovery apiserver to come up.
......

[root@ip-10-0-137-57 ~]# netstat -pan |grep 7443

Expected results:
2. Should succeed.

Additional info:

Comment 3 zhou ying 2019-08-07 01:04:48 UTC
The related jira card still in progress, is this issue fixed ?

Comment 4 Tomáš Nožička 2019-08-07 10:51:40 UTC
Fixed.

(The Jira card is about having e2e so it doesn't happen next time.)

Comment 5 zhou ying 2019-08-08 02:47:34 UTC
Hi Tomáš:

   confirmed with payload: 4.2.0-0.nightly-2019-08-08-002434, now the kube-apiserver-recovery pod can be running, but follow the doc after restart the kubelet ,all the node will be "NotReady" and can't recovery:

[root@ip-10-0-141-200 ~]# oc get node
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-141-200.us-east-2.compute.internal   NotReady   master   25h   v1.14.0+f390ff124
ip-10-0-141-33.us-east-2.compute.internal    NotReady   worker   25h   v1.14.0+f390ff124
ip-10-0-152-108.us-east-2.compute.internal   NotReady   master   25h   v1.14.0+f390ff124
ip-10-0-152-180.us-east-2.compute.internal   NotReady   worker   25h   v1.14.0+f390ff124
ip-10-0-169-234.us-east-2.compute.internal   NotReady   master   25h   v1.14.0+f390ff124
[root@ip-10-0-141-200 ~]# oc get csr
No resources found.

Comment 6 zhou ying 2019-08-08 08:23:10 UTC
Confirmed with Payload: 4.2.0-0.nightly-2019-08-08-002434, the  recovery apiserver started and become ready:
[root@ip-10-0-142-67 kubernetes]# oc get po -n openshift-kube-apiserver
NAME                                                                READY   STATUS      RESTARTS   AGE
.....
kube-apiserver-recovery-ip-10-0-142-67.us-east-2.compute.internal   1/1     Running     0          15m

Comment 7 errata-xmlrpc 2019-10-16 06:30:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922