Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1608784

Summary: install OCP v3.11 failed at TASK [Approve bootstrap nodes]
Product: OpenShift Container Platform Reporter: Weihua Meng <wmeng>
Component: InstallerAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: aos-bugs, bleanhar, dma, jokerman, mgugino, mmccomas, wmeng
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:22:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weihua Meng 2018-07-26 09:52:14 UTC
Description of problem:
install OCP v3.11 failed

Version-Release number of the following components:
openshift-ansible-3.11.0-0.9.0.git.0.195bae3None.noarch.rpm

How reproducible:
Always (3 out of 3)

Steps to Reproduce:
1. Install OCP 3.11 on RHEL Atomic Host
on AWS EC2
vm_type: m4.xlarge
1 master + 1 infra + 1 compute

Actual results:
Install failed.
TASK [Approve bootstrap nodes] *************************************************
Thursday 26 July 2018  04:52:37 -0400 (0:00:00.085)       0:21:38.148 ********* 
fatal: [ec2-xxx.compute-1.amazonaws.com]: FAILED! => {"changed": true, "finished": false, "msg": "Timed out accepting certificate signing requests. Failing as requested.", "nodes": [{"client_accepted": true, "csrs": {"csr-8qvc5": {"apiVersion": "certificates.k8s.io/v1beta1", "kind": "CertificateSigningRequest", "metadata": {"creationTimestamp": "2018-07-26T08:45:30Z", "generateName": "csr-", "name": "csr-8qvc5", "namespace": "", "resourceVersion": "689", "selfLink": "/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-8qvc5", "uid": "40841d9d-90b0-11e8-a412-0e9ba41fd52c"}, "spec": {"groups": ["system:masters", "system:cluster-admins", "system:authenticated"], "request": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkJEQ0JyQUlCQURCS01SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14TVRBdkJnTlZCQU1US0hONQpjM1JsYlRwdWIyUmxPbWx3TFRFM01pMHhPQzB3TFRFNU5pNWxZekl1YVc1MFpYSnVZV3d3V1RBVEJnY3Foa2pPClBRSUJCZ2dxaGtqT1BRTUJCd05DQUFRVmlzcDd1akJ4aWxON0w4amc1MnkxM3dnOERZRm0vTGNVRGxDR1FubWYKZytObERjNE5Wei80MThXM055TDdza1pvcGJySHE1N0hJdjVMVlBNYXJkK0FvQUF3Q2dZSUtvWkl6ajBFQXdJRApSd0F3UkFJZ1RDNStnaTk1ajg2TlpuNzlQQVVUbjZ3SU1aNnJxT2ZJR0ZQMyszSnZBbllDSUdDcWNoUnVOSDE2CldmY1ltTGllRGh0UThzbmpqeGRuWDFpZDN2S29LRFZWCi0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=", "usages": ["digital signature", "key encipherment", "client auth"], "username": "system:admin"}, "status": {"certificate": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNoekNDQVcrZ0F3SUJBZ0lVRk1jaElISGR4ZHdZOHB4cUlzbTdxWjZHalRvd0RRWUpLb1pJaHZjTkFRRUwKQlFBd0pqRWtNQ0lHQTFVRUF3d2JiM0JsYm5Ob2FXWjBMWE5wWjI1bGNrQXhOVE15TlRrME5Ua3dNQjRYRFRFNApNRGN5TmpBNE5ERXdNRm9YRFRFNU1EY3lOakE0TkRFd01Gb3dTakVWTUJNR0ExVUVDaE1NYzNsemRHVnRPbTV2ClpHVnpNVEV3THdZRFZRUURFeWh6ZVhOMFpXMDZibTlrWlRwcGNDMHhOekl0TVRndE1DMHhPVFl1WldNeUxtbHUKZEdWeWJtRnNNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVGWXJLZTdvd2NZcFRleS9JNE9kcwp0ZDhJUEEyQlp2eTNGQTVRaGtKNW40UGpaUTNPRFZjLytOZkZ0emNpKzdKR2FLVzZ4NnVleHlMK1MxVHpHcTNmCmdLTlVNRkl3RGdZRFZSMFBBUUgvQkFRREFnV2dNQk1HQTFVZEpRUU1NQW9HQ0NzR0FRVUZCd01DTUF3R0ExVWQKRXdFQi93UUNNQUF3SFFZRFZSME9CQllFRkM5Tmk3VXk0ay9mOVhoZlhOWmFYR29NMXdmMU1BMEdDU3FHU0liMwpEUUVCQ3dVQUE0SUJBUUFQMUEzL0JDbnNrTWRxekV5V0svVHpsMm9heHhacVJiNndnR2FIQ0xGV0xvcThDMkNXCmVZU2MwWDNSWUZuQ2dOM3gzblFMQXpmOERIY3NMZ1psNGFMVjh4WmVpUGF0b0l6YlQ4aE96Z3NPYXBGM0pBZ28KUW1IVnRId1lnZFVZRHY3WUdYYWd4ZEg2Uk5zK05rbTZHN2N2djlXR1lLMm9TZSs2MDd4RlRPNmlkTVBZSlNxdApibFc2OUs3ZTloWFlPbFpNeElXUHNtOGpBWVlhS281WUNDR2JtZDZlSXdlUTBpWHJ0TVFJUlFXMTlNLythRDdYCk5FYWZKR2JUaU5QdDV4Nzg3Uk00YytEYUpYbm42SzRldURmakFET3pKM0dnVFp1L3Vjd2lrYjFBMVd4UERpY3IKdW5xRDZLMm1ndlRzWk1YY1RSdnNTeGFsWTVudEMrYTgzSVQyCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K", "conditions": [{"lastUpdateTime": "2018-07-26T08:45:40Z", "message": "Auto approving kubelet client certificate after SubjectAccessReview.", "reason": "AutoApproved", "type": "Approved"}]}}, "csr-q7z49": {"apiVersion": "certificates.k8s.io/v1beta1", "kind": "CertificateSigningRequest", "metadata": {"creationTimestamp": "2018-07-26T08:48:43Z", "generateName": "csr-", "name": "csr-q7z49", "namespace": "", "resourceVersion": "1971", "selfLink": "/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-q7z49", "uid": "b3bfb38b-90b0-11e8-a412-0e9ba41fd52c"}, "spec": {"groups": ["system:nodes", "system:authenticated"], "request": 
...

Failure summary:


  1. Hosts:    ec2-xxx.compute-1.amazonaws.com
     Play:     Approve any pending CSR requests from inventory nodes
     Task:     Report approval errors
     Message:  Node approval failed
tools/launch_instance.rb:458:in `block in run_ansible_playbook': ansible failed execution, see logs (RuntimeError)

on master
[root@ip-172-18-0-196 ~]# oc get csr
NAME                                                   AGE       REQUESTOR                                                 CONDITION
csr-8qvc5                                              21m       system:admin                                              Approved,Issued
csr-q7z49                                              18m       system:node:ip-172-18-0-196.ec2.internal                  Approved,Issued
csr-rrmtd                                              17m       system:node:ip-172-18-0-196.ec2.internal                  Approved,Issued
csr-t99p6                                              21m       system:admin                                              Approved,Issued
node-csr-G9BQPQAbXwUU3uQC_ZrFpBWagFmujL0rPxkfbiRkWmU   17m       system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued
node-csr-Mq2fQJQsj5g7v3-raLL4Htn-p2NT577oHnHrCMmUFM0   17m       system:serviceaccount:openshift-infra:node-bootstrapper   Approved,Issued

Expected results:
install succeed.

Comment 3 Michael Gugino 2018-08-09 23:09:35 UTC
I tried this on Fedora Atomic Host, 3 hosts, install succeeds as expected.

Please attach failure logs directly to this BZ.  The links above are all expired.

Also, always attach inventory and variables, as well as what playbook you are running.

Comment 4 Weihua Meng 2018-08-10 02:30:29 UTC
That is good news.
If install succeeded after two weeks, It is likely be fixed during this period.

(In reply to Michael Gugino from comment #3)
> I tried this on Fedora Atomic Host, 3 hosts, install succeeds as expected.

which playbook used? 
is it the one I report the bug two weeks ago?
If not the playbook I used when the bug reported, then likely changes during those two weeks fixed it. 

> 
> Please attach failure logs directly to this BZ.  The links above are all
> expired.

They are gone for more than two weeks passed.
I did not realize the logs were needed for such long time, sorry about that.

> 
> Also, always attach inventory and variables, as well as what playbook you
> are running.

Comment 5 Michael Gugino 2018-08-10 13:27:55 UTC
@wmeng

I need install logs, inventory, and need to know what/how you ran ansible-playbook.

Please retry whatever was done to discover this problem and provide this information so I can try to figure out what the problem is.

Comment 7 Weihua Meng 2018-08-12 02:02:05 UTC
remove testblocker, as more than two weeks passed, not meet the issue with latest build 3.11.0-0.13.0

OCP v3.11.0-0.9.0 can reproduce this bug.

Comment 8 Michael Gugino 2018-08-13 15:39:49 UTC
I don't see any immediate reason why this would have failed in v3.11.0-0.9.0.  The output of the csr module appears to indicate that all the csrs are approved, the problem is timeout with no additional info.

Results:

   "results":[
      {
         "cmd":"/usr/local/bin/oc adm certificate approve csr-6p2xq",
         "results":{ },
         "returncode":0
      },
      {
         "cmd":"/usr/local/bin/oc adm certificate approve csr-75qvw",
         "results":{ },
         "returncode":0
      },
      {
         "cmd":"/usr/local/bin/oc adm certificate approve csr-n6hk8",
         "results":{ },
         "returncode":0
      },
      {
         "cmd":"/usr/local/bin/oc adm certificate approve node-csr-4nCWplUj64E5xCyQ8-mVxTTDExShGyZ0Z6synaGCwZI",
         "results":{ },
         "returncode":0
      },
      {
         "cmd":"/usr/local/bin/oc adm certificate approve node-csr-aE-RL4RCYc5kqZaVP64iPdcDE8Kpt8xCGbF4Kr8w3mM",
         "results":{ },
         "returncode":0
      },
      {
         "cmd":"/usr/local/bin/oc adm certificate approve node-csr-zEm4fCqhtwG_QLnsOBibyEP6N2vFQcqQ6UnpWxFa1hE",
         "results":{ },
         "returncode":0
      }


As you can see, there are only 6 results posted; 2 for each of 3 nodes, but we should have 8 total:

TASK [Dump the bootstrap hostnames] ********************************************
Sunday 12 August 2018  09:27:44 +0800 (0:00:00.218)       0:20:19.480 ********* 
ok: [qe-wmengah31109-master-etcd-1.0812-v8n.qe.rhcloud.com] => {
    "msg": [
        "qe-wmengah31109-master-etcd-1", 
        "qe-wmengah31109-node-registry-router-1", 
        "qe-wmengah31109-node-1", 
        "qe-wmengah31109-node-2"
    ]
}

Most likely fixed by: 446e64cd3744b72fce9512ab1225e75475a3104b but it's not clear why.

Comment 9 Scott Dodson 2018-08-16 13:28:21 UTC
Can we please test with openshift-ansible-3.11.0-0.16.0 which contains the commit mentioned in the previous comment?

Comment 10 Weihua Meng 2018-08-17 06:22:15 UTC
Fixed.
openshift-ansible-3.11.0-0.16.0.git.0.e82689aNone.noarch

Installation succeeded and cluster is working well.

Comment 12 errata-xmlrpc 2018-10-11 07:22:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652