Bug 1623204

Summary: [3.10] Installation stuck at TASK [Approve node certificates when bootstrapping]
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: InstallerAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, jialiu, jokerman, juzhao, mark.vinkx, mgoldman, mgugino, mmccomas, scortopa, wabouham, wmeng
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1622945 Environment:
Last Closed: 2018-09-04 07:10:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1622945, 1623248, 1625817    
Bug Blocks:    

Comment 1 Scott Dodson 2018-08-28 20:16:27 UTC
*** Bug 1623248 has been marked as a duplicate of this bug. ***

Comment 4 Johnny Liu 2018-08-30 02:50:33 UTC
Verified this bug with openshift-ansible-3.10.41-1.git.0.fd15dd7.el7.noarch, and PASS.

3 master + 2 infra nodes + 2 compute nodes

TASK [Dump the bootstrap hostnames] ********************************************
Thursday 30 August 2018  10:41:10 +0800 (0:00:00.492)       0:18:10.996 ******* 
ok: [qe-jialiu310z-master-etcd-1.0830-tir.qe.rhcloud.com] => {
    "msg": [
        "qe-jialiu310z-master-etcd-1", 
        "qe-jialiu310z-master-etcd-2", 
        "qe-jialiu310z-master-etcd-3", 
        "qe-jialiu310z-node-infra-1", 
        "qe-jialiu310z-node-infra-2", 
        "qe-jialiu310z-node-1", 
        "qe-jialiu310z-node-2"
    ]
}

TASK [Approve node certificates when bootstrapping] ****************************
Thursday 30 August 2018  10:41:11 +0800 (0:00:00.051)       0:18:11.047 ******* 
FAILED - RETRYING: Approve node certificates when bootstrapping (30 retries left).

FAILED - RETRYING: Approve node certificates when bootstrapping (29 retries left).

changed: [qe-jialiu310z-master-etcd-1.0830-tir.qe.rhcloud.com] => {"attempts": 3, "changed": true, "client_approve_results": ["certificatesigningrequest.certificates.k8s.io \"node-csr-AqJ8l3NzCYjVlx7Cr389kIx-DHdMvOmUO0tySt2Hy4k\" approved\n", "certificatesigningrequest.certificates.k8s.io \"node-csr-WxMB-aiwjgta9QJFSLzf4xuWnGuoTYtizzSe99yK4uw\" approved\n", "certificatesigningrequest.certificates.k8s.io \"node-csr-wx62rG28vKVtf-PcteavQlW8hVXr6zMEW70vuNRfUzA\" approved\n", "certificatesigningrequest.certificates.k8s.io \"node-csr-kZL50Fxsyh2eeXSKyn14LqidM9krUXsIhRZncUcETLM\" approved\n"], "failed": false, "rc": 0, "server_approve_results": ["certificatesigningrequest.certificates.k8s.io \"csr-tw9nx\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-5txzl\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-nr74w\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-qbjv9\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-nt66h\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-pvmtv\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-d6s9x\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-g4d85\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-4lstl\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-4l4hg\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-nn4mj\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-l8ngc\" approved\n", "certificatesigningrequest.certificates.k8s.io \"csr-5xwds\" approved\n"]}


[root@qe-jialiu310z-master-etcd-1 ~]# oc get node
NAME                          STATUS    ROLES     AGE       VERSION
qe-jialiu310z-master-etcd-1   Ready     master    12m       v1.10.0+b81c8f8
qe-jialiu310z-master-etcd-2   Ready     master    12m       v1.10.0+b81c8f8
qe-jialiu310z-master-etcd-3   Ready     master    12m       v1.10.0+b81c8f8
qe-jialiu310z-node-1          Ready     compute   8m        v1.10.0+b81c8f8
qe-jialiu310z-node-2          Ready     compute   8m        v1.10.0+b81c8f8
qe-jialiu310z-node-infra-1    Ready     infra     8m        v1.10.0+b81c8f8
qe-jialiu310z-node-infra-2    Ready     infra     8m        v1.10.0+b81c8f8


[root@qe-jialiu310z-master-etcd-1 ~]# oc get csr
NAME                                                   AGE       REQUESTOR                                                 CONDITION
csr-4l4hg                                              12m       system:admin                                              Approved,Issued
csr-4lstl                                              12m       system:admin                                              Approved,Issued
csr-5txzl                                              8m        system:node:qe-jialiu310z-node-1                          Approved,Issued
csr-5xwds                                              8m        system:node:qe-jialiu310z-node-infra-1                    Approved,Issued
csr-b6pcz                                              12m       system:admin                                              Approved,Issued
csr-d6s9x                                              9m        system:node:qe-jialiu310z-master-etcd-2                   Approved,Issued
csr-g4d85                                              8m        system:node:qe-jialiu310z-master-etcd-3                   Approved,Issued
csr-l8ngc                                              8m        system:node:qe-jialiu310z-node-infra-2                    Approved,Issued
csr-nn4mj                                              9m        system:node:qe-jialiu310z-master-etcd-3                   Approved,Issued
csr-nr74w                                              8m        system:node:qe-jialiu310z-master-etcd-1                   Approved,Issued
csr-nt66h                                              9m        system:node:qe-jialiu310z-master-etcd-1                   Approved,Issued
csr-pvmtv                                              12m       system:admin                                              Approved,Issued
csr-qbjv9                                              8m        system:node:qe-jialiu310z-master-etcd-2                   Approved,Issued
csr-sfxk8                                              12m       system:admin                                              Approved,Issued
csr-tw9nx                                              8m        system:node:qe-jialiu310z-node-2                          Approved,Issued
csr-vpkjz                                              12m       system:admin                                              Approved,Issued
node-csr-AqJ8l3NzCYjVlx7Cr389kIx-DHdMvOmUO0tySt2Hy4k   8m        system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
node-csr-WxMB-aiwjgta9QJFSLzf4xuWnGuoTYtizzSe99yK4uw   8m        system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
node-csr-kZL50Fxsyh2eeXSKyn14LqidM9krUXsIhRZncUcETLM   8m        system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
node-csr-wx62rG28vKVtf-PcteavQlW8hVXr6zMEW70vuNRfUzA   8m        system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued

Comment 6 errata-xmlrpc 2018-09-04 07:10:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2578

Comment 7 Marek Goldmann 2018-09-05 07:20:12 UTC
Moving my comment https://bugzilla.redhat.com/show_bug.cgi?id=1622945#c10 here.

I'm seeing the same issue even with the updated package: openshift-ansible-3.10.41-1.git.0.fd15dd7.el7.noarch.

I think it can be related to the inventory I use, where I have one master (openshift_node_group_name=node-config-master-infra) and two nodes (openshift_node_group_name=node-config-compute). It fails all the time with updated package. When I downgrade to openshift-ansible-3.10.34-1.git.0.48df172None.noarch, then everything works.

Comment 8 Scott Dodson 2018-09-05 17:20:07 UTC
Since this bug has already transitioned to CLOSED by errata tool can you please open a new bug. With the new bug can you include your complete inventory, as well as the output of `oc get nodes` and `oc get csr -o yaml`

The last will likely contain private data for signed certificates so please mark it as a private attachment unless it's just a test environment you don't care about.

My suspicion is that the name on the CSR is different than what we're expecting it to be.

Comment 9 Serena Cortopassi 2018-09-06 08:36:41 UTC
@Scott Dodson, as suggested I opened https://bugzilla.redhat.com/show_bug.cgi?id=1625911

I also feel there could be something wrong in csr name generation.