Bug 1622945
Summary: | [3.11] Installation stuck at TASK [Approve node certificates when bootstrapping] | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | |
Component: | Installer | Assignee: | Michael Gugino <mgugino> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Johnny Liu <jialiu> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 3.11.0 | CC: | aos-bugs, ekuric, fgrosjea, hgomes, jokerman, mark.vinkx, mgoldman, mgugino, mmccomas, ndordet, roxenham, rsandu, scortopa, simon.krenger, wabouham, wmeng, wsun | |
Target Milestone: | --- | Keywords: | Regression, TestBlocker | |
Target Release: | 3.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1623204 1623248 1625817 (view as bug list) | Environment: | ||
Last Closed: | 2018-12-21 15:23:13 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1565405, 1623204, 1623248, 1625817 |
Description
Junqi Zhao
2018-08-28 08:39:03 UTC
I also hit such issue, seem like this bug is easy to be reproduced in a big scale of cluster. In a 3 node clutser (1 master node + 1 infra node + 1 compute node), installation is completed successfully. In a 5 node clutser (3 master node + 1 infra node + 1 compute node), installation failed. In the comment 0, the cluster is 1 master node + 2 infra nodes + 2 compute nodes. This is introduced recently (since .24 build), and is blocking master HA env set up, blocking QE's testing. The PR is merged into openshift-ansible-3.11.0-0.25.0, move it to ON_QA. Verified this bug with openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch, and PASS. TASK [Dump the bootstrap hostnames] ******************************************** Wednesday 29 August 2018 11:37:05 +0800 (0:00:00.265) 0:00:50.049 ****** ok: [ec2-52-54-185-242.compute-1.amazonaws.com] => { "msg": [ "ip-172-18-12-30.ec2.internal", "ip-172-18-3-190.ec2.internal", "ip-172-18-7-182.ec2.internal", "ip-172-18-2-12.ec2.internal", "ip-172-18-8-189.ec2.internal" ] } TASK [Approve node certificates when bootstrapping] **************************** Wednesday 29 August 2018 11:37:05 +0800 (0:00:00.083) 0:00:50.132 ****** FAILED - RETRYING: Approve node certificates when bootstrapping (30 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (29 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (28 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (27 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (26 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (25 retries left). changed: [ec2-52-54-185-242.compute-1.amazonaws.com] => {"attempts": 7, "changed": true, "client_approve_results": [], "rc": 0, "server_approve_results": ["certificatesigningrequest.certificates.k8s.io/csr-gzgp8 approved\n", "certificatesigningrequest.certificates.k8s.io/csr-8m2cf approved\n"]} [root@ip-172-18-12-30 ~]# oc get csr NAME AGE REQUESTOR CONDITION csr-4fb8x 22m system:node:ip-172-18-3-190.ec2.internal Approved,Issued csr-4kbph 24m system:admin Approved,Issued csr-6qrkd 24m system:admin Approved,Issued csr-7bb2s 19m system:node:ip-172-18-12-30.ec2.internal Approved,Issued csr-8lvvf 24m system:admin Approved,Issued csr-8m2cf 18m system:node:ip-172-18-8-189.ec2.internal Approved,Issued csr-bgx75 22m system:node:ip-172-18-7-182.ec2.internal Approved,Issued csr-bvl28 19m system:node:ip-172-18-3-190.ec2.internal Approved,Issued csr-cz2b7 24m system:admin Approved,Issued csr-gzgp8 18m system:node:ip-172-18-2-12.ec2.internal Approved,Issued csr-lfbrj 24m system:admin Approved,Issued csr-n5dnc 24m system:admin Approved,Issued csr-pj4pt 19m system:node:ip-172-18-7-182.ec2.internal Approved,Issued csr-q7kcz 19m system:node:ip-172-18-8-189.ec2.internal Approved,Issued csr-t6dj9 19m system:node:ip-172-18-2-12.ec2.internal Approved,Issued csr-zqxlr 22m system:node:ip-172-18-12-30.ec2.internal Approved,Issued node-csr-6PKGRUMJDch4nq6Hd45QCer0so5VZjCe7DdCrbmkMBI 19m system:serviceaccount:openshift-infra:node-bootstrapper Approved,Issued node-csr-p2foh5KptJCXUbKmE3DtN3HSEOUwKaiCnq5nPCZWzo4 19m system:serviceaccount:openshift-infra:node-bootstrapper Approved,Issued meet this openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch TASK [Approve node certificates when bootstrapping] **************************** Tuesday 04 September 2018 17:24:17 +0800 (0:00:00.083) 0:17:50.596 ***** FAILED - RETRYING: Approve node certificates when bootstrapping (30 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (29 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (3 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (2 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (1 retries left). fatal: [share3-wmengah311-master-etcd-zone1-1.0904-beu.qe.rhcloud.com]: FAILED! => {"attempts": 30, "changed": false, "msg": "Cound not find csr for nodes: share3-wmengah311-master-etcd-zone1-1", "state": "unknown"} [root@share3-wmengah311-master-etcd-zone1-1 ~]# oc get node NAME STATUS ROLES AGE VERSION share3-wmengah311-master-etcd-zone2-1 Ready master 29m v1.11.0+d4cacc0 share3-wmengah311-master-etcd-zone2-2 Ready master 29m v1.11.0+d4cacc0 [root@share3-wmengah311-master-etcd-zone1-1 ~]# oc get csr NAME AGE REQUESTOR CONDITION csr-5h8hn 11m system:admin Pending csr-d6n6v 11m system:admin Pending csr-dk679 11m system:admin Pending csr-fg4ms 11m system:admin Approved,Issued csr-hdbv7 8m system:node:share3-wmengah311-master-etcd-zone2-2 Pending csr-htrq2 7m system:node:share3-wmengah311-master-etcd-zone2-1 Pending csr-k742q 11m system:admin Approved,Issued csr-lv872 8m system:node:share3-wmengah311-master-etcd-zone2-1 Pending csr-lz9mg 7m system:node:share3-wmengah311-master-etcd-zone2-2 Pending csr-n2nvz 3m system:node:share3-wmengah311-master-etcd-zone1-1 Pending csr-wwg5d 1m system:node:share3-wmengah311-master-etcd-zone1-1 Pending csr-xv2vv 11m system:admin Approved,Issued csr-xvnml 7m system:node:share3-wmengah311-master-etcd-zone1-1 Pending csr-znd2l 8m system:node:share3-wmengah311-master-etcd-zone1-1 Pending node-csr--kVBjtGE8wbaDAscL3IH5_6rWZGku46qSV0pL_XqNgs 7m system:serviceaccount:openshift-infra:node-bootstrapper Pending node-csr-2UaLHRSRPOXm9nhI98SBfyzgNkX3V-C7SIxP5tbqJqU 7m system:serviceaccount:openshift-infra:node-bootstrapper Pending node-csr-67v1yVAJlG-gIHZHvOA90d6Dhjb9YhCou94wvg2EARM 7m system:serviceaccount:openshift-infra:node-bootstrapper Pending node-csr-Tp8zgZHGhOkFjteTYtr_OlDakdJ_25omEPT07u4eIuA 7m system:serviceaccount:openshift-infra:node-bootstrapper Pending node-csr-_e2GsHH_F2ccF0SY21cBhgB_rQtgW2lqbh2zfmfrm-8 7m system:serviceaccount:openshift-infra:node-bootstrapper Pending node-csr-zqu_gDIn5YLNGWw5zXaU0mIe8gseNLlQoQUmBNCvVaw 7m system:serviceaccount:openshift-infra:node-bootstrapper Pending Seeing similar issue with openshift-ansible-3.10.41-1.git.0.fd15dd7.el7.noarch. (In reply to Marek Goldmann from comment #10) > Seeing similar issue with > openshift-ansible-3.10.41-1.git.0.fd15dd7.el7.noarch. The same here with version 3.10. Also witnessed in openshift-ansible-3.10.41-1.git.0.fd15dd7.el7.noarch Not able to get through the deployment. We should probably consider that to be a new bug since QE has already VERIFIED this bug and we may be dealing with a new problem. With the new bug can you include your complete inventory, as well as the output of `oc get nodes` and `oc get csr -o yaml` The last will likely contain private data for signed certificates so please mark it as a private attachment unless it's just a test environment you don't care about. My suspicion is that the name on the CSR is different than what we're expecting it to be. Create a new 3.10 bug https://bugzilla.redhat.com/show_bug.cgi?id=1625817 to track the 3.10 issue , since this original bug has been fixed and been verified by QE,so still change back the bug's status to VERIFIED,if something still needs to be fixed in this bug,feel free to re-open it Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content. |