Bug 1812787
Summary: | RHEL scaleup fails with error "Failed to approve node CSR" | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yang Yang <yanyang> | ||||
Component: | Node | Assignee: | Ryan Phillips <rphillips> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Sunil Choudhary <schoudha> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.4 | CC: | aos-bugs, gpei, jokerman, rphillips, scuppett, wjiang, wsun | ||||
Target Milestone: | --- | Keywords: | Regression, TestBlocker | ||||
Target Release: | 4.4.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1815010 (view as bug list) | Environment: | |||||
Last Closed: | 2020-03-31 18:14:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1815010, 1817382 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Logs show that wsun443121-fcgdb-rhel-1.wsun443121.qe.devcluster.openshift.com did have it's csr approved however rhel-0 and rhel-2 did not. The resulting failure message obscured the fact rhel-1 was approved. Is this reliably reproducible? Adding testblocker since it always happens for UPI on Bare Metal with proxy. Still could reproduce it on Bare Metal today. It's reproduced on GCP The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1669573 [details] Scaleup debug logs Description of problem: RHEL scaleup fails with error "Failed to approve node CSR", please get debug logs from attahment Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version openshift-ansible-4.4.0-202003060720.git.0.085aeb0.el7 How reproducible: Always Steps to Reproduce: 1. Install a UPI cluster on baremetal with rhcos workers removed 2. Scaleup with RHEL nodes 3. Actual results: Scaleup fails Expected results: Scaleup succeed Additional info: TASK [openshift_node : Approve node CSR] *************************************** Thursday 12 March 2020 15:11:25 +0800 (0:06:53.930) 0:12:25.891 ******** FAILED - RETRYING: Approve node CSR (6 retries left). FAILED - RETRYING: Approve node CSR (5 retries left). FAILED - RETRYING: Approve node CSR (4 retries left). FAILED - RETRYING: Approve node CSR (3 retries left). FAILED - RETRYING: Approve node CSR (2 retries left). FAILED - RETRYING: Approve node CSR (1 retries left). failed: [wsun443121-fcgdb-rhel-0.wsun443121.qe.devcluster.openshift.com -> localhost] (item=wsun443121-fcgdb-rhel-0.wsun443121.qe.devcluster.openshift.com) => {"ansible_loop_var": "item", "attempts": 6, "changed": true, "cmd": "count=0; for csr in `oc --kubeconfig=/tmp/installer-MmSdAO/auth/kubeconfig get csr --no-headers | grep \" system:node:wsun443121-fcgdb-rhel-0 \" | cut -d \" \" -f1`;\ndo\n oc --kubeconfig=/tmp/installer-MmSdAO/auth/kubeconfig adm certificate approve ${csr};\n if [ $? -eq 0 ];\n then\n count=$((count+1));\n fi;\ndone; exit $((!count));\n", "delta": "0:00:00.196222", "end": "2020-03-12 15:11:58.259209", "item": "wsun443121-fcgdb-rhel-0.wsun443121.qe.devcluster.openshift.com", "msg": "non-zero return code", "rc": 1, "start": "2020-03-12 15:11:58.062987", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}