Bug 1575822

Summary: Install failed due to Approve bootstrap nodes timeout
Product: OpenShift Container Platform Reporter: Weihua Meng <wmeng>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED DUPLICATE QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: medium    
Version: 3.10.0CC: ajuricic, akaiser, aos-bugs, bleanhar, fshaikh, ghuang, jokerman, mmccomas, openshift-bugs-escalate, pkanthal, sdodson, tzumainn, wmeng
Target Milestone: ---Keywords: Reopened
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-01 12:39:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Weihua Meng 2018-05-08 03:18:27 UTC
Description of problem:
Install failed due to Approve bootstrap nodes timeout
Is timeout 60 too short?
https://github.com/openshift/openshift-ansible/blob/master/playbooks/openshift-node/private/join.yml#L29

Version-Release number of the following components:
openshift-ansible-3.10.0-0.36.0

How reproducible:
Sometimes

Steps to Reproduce:
1. Install OCP 3.10 

Actual results:
Install failed.

TASK [Approve bootstrap nodes] *************************************************
Monday 07 May 2018  22:03:37 -0400 (0:00:00.054)       0:11:56.686 ************ 
fatal: [qe-wmengcrio4-master-etcd-1.0507-o68.qe.rhcloud.com]: FAILED! => {"changed": true, "failed": true, "finished": false, "msg": "Timed out accepting certificate signing requests. Failing as requested.", "nodes": [{"accepted": true, "csrs": {"csr-c4dbd": {"apiVersion": "certificates.k8s.io/v1beta1", "kind": "CertificateSigningRequest", "metadata": {"creationTimestamp": "2018-05-08T02:02:37Z", "generateName": "csr-", "name": "csr-c4dbd", "namespace": "", "resourceVersion": "522", "selfLink": "/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-c4dbd", "uid": "e1ad1550-5263-11e8-a773-42010af00002"}, "spec": {"groups": ["system:masters", "system:cluster-admins", "system:authenticated"], "request": 
<---snipped--->

TASK [Report approval errors] **************************************************
Monday 07 May 2018  22:04:57 -0400 (0:00:00.753)       0:13:16.684 ************ 
fatal: [qe-wmengcrio4-master-etcd-1.0507-o68.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "msg": "Node approval failed"}

All csrs approved.
[root@qe-wmengcrio4-master-etcd-1 ~]# oc get csr
NAME                                                   AGE       REQUESTOR                                                 CONDITION
csr-4ncp2                                              17m       system:node:qe-wmengcrio4-node-registry-router-2          Approved,Issued
csr-c4dbd                                              18m       system:admin                                              Approved,Issued
csr-f8jzn                                              17m       system:node:qe-wmengcrio4-master-etcd-1                   Approved,Issued
csr-fdcts                                              17m       system:node:qe-wmengcrio4-glusterfs-node-3                Approved,Issued
csr-svgzt                                              18m       system:admin                                              Approved,Issued
csr-w5n89                                              16m       system:node:qe-wmengcrio4-glusterfs-node-1                Approved,Issued
node-csr-RFT_-eZU1T2IgMJ6Gz75MFp8Mg0ewwPnq-eeDEIimG8   17m       system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
node-csr-eKmKiTQKE7SG2AG4yZXzHPWSGnlNOm8w-yfdFRMLaOw   17m       system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
node-csr-gVKh4T1wNpSLX_furDmY1JxEvb3ku7T0kDVhEQud-xo   17m       system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
[root@qe-wmengcrio4-master-etcd-1 ~]#

Expected results:
Install succeeds

Comment 2 Gan Huang 2018-05-18 02:47:08 UTC
This is constantly occurring for nodes scaling up.

Even if updating the timeout to 600s (https://github.com/openshift/openshift-ansible/blob/master/playbooks/openshift-node/private/join.yml#L29), no luck.

Adding test blocker.

Comment 4 Gan Huang 2018-05-18 02:53:15 UTC
Please let me know what info else is needed.

Comment 9 Scott Dodson 2018-05-21 18:51:14 UTC
Lets re-open if this can be reproduced without the load balancer misconfiguration.

Comment 10 Weihua Meng 2018-05-29 07:24:01 UTC
Not meet this issue recently with latest build v3.10.0-0.53.0.
No need to try old versions.

Comment 24 Scott Dodson 2018-10-01 12:39:46 UTC
Closing this as a dupe of 1628964 which is being used to track the backport of CSR approval fixes from 3.11 to 3.10.z

*** This bug has been marked as a duplicate of bug 1628964 ***