1575822 – Install failed due to Approve bootstrap nodes timeout

Bug 1575822 - Install failed due to Approve bootstrap nodes timeout

Summary: Install failed due to Approve bootstrap nodes timeout

Keywords:
Status:	CLOSED DUPLICATE of bug 1628964
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Scott Dodson
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-08 03:18 UTC by Weihua Meng
Modified:	2018-10-01 12:39 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-01 12:39:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Weihua Meng 2018-05-08 03:18:27 UTC

Description of problem:
Install failed due to Approve bootstrap nodes timeout
Is timeout 60 too short?
https://github.com/openshift/openshift-ansible/blob/master/playbooks/openshift-node/private/join.yml#L29

Version-Release number of the following components:
openshift-ansible-3.10.0-0.36.0

How reproducible:
Sometimes

Steps to Reproduce:
1. Install OCP 3.10 

Actual results:
Install failed.

TASK [Approve bootstrap nodes] *************************************************
Monday 07 May 2018  22:03:37 -0400 (0:00:00.054)       0:11:56.686 ************ 
fatal: [qe-wmengcrio4-master-etcd-1.0507-o68.qe.rhcloud.com]: FAILED! => {"changed": true, "failed": true, "finished": false, "msg": "Timed out accepting certificate signing requests. Failing as requested.", "nodes": [{"accepted": true, "csrs": {"csr-c4dbd": {"apiVersion": "certificates.k8s.io/v1beta1", "kind": "CertificateSigningRequest", "metadata": {"creationTimestamp": "2018-05-08T02:02:37Z", "generateName": "csr-", "name": "csr-c4dbd", "namespace": "", "resourceVersion": "522", "selfLink": "/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-c4dbd", "uid": "e1ad1550-5263-11e8-a773-42010af00002"}, "spec": {"groups": ["system:masters", "system:cluster-admins", "system:authenticated"], "request": 
<---snipped--->

TASK [Report approval errors] **************************************************
Monday 07 May 2018  22:04:57 -0400 (0:00:00.753)       0:13:16.684 ************ 
fatal: [qe-wmengcrio4-master-etcd-1.0507-o68.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "msg": "Node approval failed"}

All csrs approved.
[root@qe-wmengcrio4-master-etcd-1 ~]# oc get csr
NAME                                                   AGE       REQUESTOR                                                 CONDITION
csr-4ncp2                                              17m       system:node:qe-wmengcrio4-node-registry-router-2          Approved,Issued
csr-c4dbd                                              18m       system:admin                                              Approved,Issued
csr-f8jzn                                              17m       system:node:qe-wmengcrio4-master-etcd-1                   Approved,Issued
csr-fdcts                                              17m       system:node:qe-wmengcrio4-glusterfs-node-3                Approved,Issued
csr-svgzt                                              18m       system:admin                                              Approved,Issued
csr-w5n89                                              16m       system:node:qe-wmengcrio4-glusterfs-node-1                Approved,Issued
node-csr-RFT_-eZU1T2IgMJ6Gz75MFp8Mg0ewwPnq-eeDEIimG8   17m       system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
node-csr-eKmKiTQKE7SG2AG4yZXzHPWSGnlNOm8w-yfdFRMLaOw   17m       system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
node-csr-gVKh4T1wNpSLX_furDmY1JxEvb3ku7T0kDVhEQud-xo   17m       system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
[root@qe-wmengcrio4-master-etcd-1 ~]#

Expected results:
Install succeeds

Comment 2 Gan Huang 2018-05-18 02:47:08 UTC

This is constantly occurring for nodes scaling up.

Even if updating the timeout to 600s (https://github.com/openshift/openshift-ansible/blob/master/playbooks/openshift-node/private/join.yml#L29), no luck.

Adding test blocker.

Comment 4 Gan Huang 2018-05-18 02:53:15 UTC

Please let me know what info else is needed.

Comment 9 Scott Dodson 2018-05-21 18:51:14 UTC

Lets re-open if this can be reproduced without the load balancer misconfiguration.

Comment 10 Weihua Meng 2018-05-29 07:24:01 UTC

Not meet this issue recently with latest build v3.10.0-0.53.0.
No need to try old versions.

Comment 24 Scott Dodson 2018-10-01 12:39:46 UTC

Closing this as a dupe of 1628964 which is being used to track the backport of CSR approval fixes from 3.11 to 3.10.z

*** This bug has been marked as a duplicate of bug 1628964 ***

Note You need to log in before you can comment on or make changes to this bug.