| Summary: | Nodes not ready after 50 retries | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Gan Huang <ghuang> |
| Component: | Installer | Assignee: | Jason DeTiberus <jdetiber> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Johnny Liu <jialiu> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.3.0 | CC: | aos-bugs, bleanhar, ghuang, jokerman, mmccomas |
| Target Milestone: | --- | Keywords: | UpcomingRelease |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-09-08 11:57:02 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
It looks like we are checking the return code, which I would expect to be 0, but maybe we need to use a go template or json template to return the status for now? nevermind my previous comment, I missed the actual failure when reading from my phone earlier. Looking closer at the output, this appears to be a legitimate failure. The node that failed the check was 'ip-172-18-6-178.ec2.internal', which isn't listed in the 'oc get nodes' output. Is there an error in the node logs for that host? Sorry, I didn't catch the logs. But I had checked the node status after the failed installtion. # oc get nodes NAME STATUS AGE ip-172-18-10-178.ec2.internal Ready 53m ip-172-18-10-183.ec2.internal Ready 53m ip-172-18-6-179.ec2.internal Ready 53m ip-172-18-7-238.ec2.internal Ready 53m ip-172-18-7-239.ec2.internal Ready 53m Looks like all nodes were ready well. I will catch more for you if I met the issue again. Hi Huang Gan, Has this issue happened recently? If not I suggest we close this. Hi Breton, I didn't experience the issue again recently, I agree to close it temporarily. Sounds good. Thanks for the help Huang Gan. |
Description of problem: I tried to install a HA env(2 masters + 6 nodes + 3 etcd), installed would failed at TASK [openshift_manage_node : Wait for Node Registration]. But all nodes actually had became ready when I checked the status manually. Version-Release number of selected component (if applicable): openshift-ansible-3.3.12-1.git.0.b26c8c2.el7.noarch.rpm How reproducible: 30% Steps to Reproduce: 1.#cat inventory_hosts <--snip--> [masters] ec2-52-207-222-117.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-52-207-222-117.compute-1.amazonaws.com ec2-54-234-204-42.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-54-234-204-42.compute-1.amazonaws.com [nodes] ec2-52-207-222-117.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-52-207-222-117.compute-1.amazonaws.com openshift_node_labels="{'role': 'node'}" openshift_scheduleable=false ec2-54-234-204-42.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-54-234-204-42.compute-1.amazonaws.com openshift_node_labels="{'role': 'node'}" openshift_scheduleable=false ec2-54-165-53-104.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-54-165-53-104.compute-1.amazonaws.com openshift_node_labels="{'role': 'node','registry': 'enabled','router': 'enabled'}" ec2-54-172-58-88.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-54-172-58-88.compute-1.amazonaws.com openshift_node_labels="{'role': 'node','registry': 'enabled','router': 'enabled'}" ec2-54-164-208-60.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-54-164-208-60.compute-1.amazonaws.com openshift_node_labels="{'role': 'node'}" ec2-52-90-219-39.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-52-90-219-39.compute-1.amazonaws.com openshift_node_labels="{'role': 'node'}" [etcd] ec2-54-234-204-42.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-54-234-204-42.compute-1.amazonaws.com ec2-54-165-53-104.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-54-165-53-104.compute-1.amazonaws.com ec2-54-172-58-88.compute-1.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave1/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem" openshift_public_hostname=ec2-54-172-58-88.compute-1.amazonaws.com 2. Trigger the installation 3. Actual results: TASK [openshift_manage_node : Wait for Node Registration] ********************** Thursday 18 August 2016 03:34:47 +0000 (0:00:00.072) 0:28:35.787 ******* ok: [ec2-52-207-222-117.compute-1.amazonaws.com] => (item=ip-172-18-10-178.ec2.internal) => {"changed": false, "cmd": ["oc", "get", "node", "ip-172-18-10-178.ec2.internal"], "delta": "0:00:00.164148", "end": "2016-08-17 23:34:50.005681", "item": "ip-172-18-10-178.ec2.internal", "rc": 0, "start": "2016-08-17 23:34:49.841533", "stderr": "", "stdout": "NAME STATUS AGE\nip-172-18-10-178.ec2.internal Ready 59s", "stdout_lines": ["NAME STATUS AGE", "ip-172-18-10-178.ec2.internal Ready 59s"], "warnings": []} ok: [ec2-52-207-222-117.compute-1.amazonaws.com] => (item=ip-172-18-10-183.ec2.internal) => {"changed": false, "cmd": ["oc", "get", "node", "ip-172-18-10-183.ec2.internal"], "delta": "0:00:00.136783", "end": "2016-08-17 23:34:51.713095", "item": "ip-172-18-10-183.ec2.internal", "rc": 0, "start": "2016-08-17 23:34:51.576312", "stderr": "", "stdout": "NAME STATUS AGE\nip-172-18-10-183.ec2.internal Ready 1m", "stdout_lines": ["NAME STATUS AGE", "ip-172-18-10-183.ec2.internal Ready 1m"], "warnings": []} ok: [ec2-52-207-222-117.compute-1.amazonaws.com] => (item=ip-172-18-7-239.ec2.internal) => {"changed": false, "cmd": ["oc", "get", "node", "ip-172-18-7-239.ec2.internal"], "delta": "0:00:00.129385", "end": "2016-08-17 23:34:53.420047", "item": "ip-172-18-7-239.ec2.internal", "rc": 0, "start": "2016-08-17 23:34:53.290662", "stderr": "", "stdout": "NAME STATUS AGE\nip-172-18-7-239.ec2.internal Ready 1m", "stdout_lines": ["NAME STATUS AGE", "ip-172-18-7-239.ec2.internal Ready 1m"], "warnings": []} ok: [ec2-52-207-222-117.compute-1.amazonaws.com] => (item=ip-172-18-7-238.ec2.internal) => {"changed": false, "cmd": ["oc", "get", "node", "ip-172-18-7-238.ec2.internal"], "delta": "0:00:00.130831", "end": "2016-08-17 23:34:55.121077", "item": "ip-172-18-7-238.ec2.internal", "rc": 0, "start": "2016-08-17 23:34:54.990246", "stderr": "", "stdout": "NAME STATUS AGE\nip-172-18-7-238.ec2.internal Ready 1m", "stdout_lines": ["NAME STATUS AGE", "ip-172-18-7-238.ec2.internal Ready 1m"], "warnings": []} FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (50 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (49 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (48 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (47 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (46 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (45 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (44 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (43 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (42 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (41 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (40 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (39 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (38 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (37 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (36 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (35 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (34 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (33 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (32 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (31 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (30 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (29 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (28 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (27 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (26 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (25 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (24 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (23 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (22 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (21 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (20 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (19 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (18 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (17 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (16 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (15 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (14 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (13 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (12 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (11 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (10 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (9 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (8 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (7 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (6 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (5 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (4 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (3 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (2 retries left). FAILED - RETRYING: TASK: openshift_manage_node : Wait for Node Registration (1 retries left). failed: [ec2-52-207-222-117.compute-1.amazonaws.com] (item=ip-172-18-6-178.ec2.internal) => {"changed": false, "cmd": ["oc", "get", "node", "ip-172-18-6-178.ec2.internal"], "delta": "0:00:00.127775", "end": "2016-08-17 23:40:31.439373", "failed": true, "item": "ip-172-18-6-178.ec2.internal", "rc": 1, "start": "2016-08-17 23:40:31.311598", "stderr": "Error from server: nodes \"ip-172-18-6-178.ec2.internal\" not found", "stdout": "", "stdout_lines": [], "warnings": []} ok: [ec2-52-207-222-117.compute-1.amazonaws.com] => (item=ip-172-18-6-179.ec2.internal) => {"changed": false, "cmd": ["oc", "get", "node", "ip-172-18-6-179.ec2.internal"], "delta": "0:00:00.128504", "end": "2016-08-17 23:40:33.133694", "item": "ip-172-18-6-179.ec2.internal", "rc": 0, "start": "2016-08-17 23:40:33.005190", "stderr": "", "stdout": "NAME STATUS AGE\nip-172-18-6-179.ec2.internal Ready 6m", "stdout_lines": ["NAME STATUS AGE", "ip-172-18-6-179.ec2.internal Ready 6m"], "warnings": []} NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/home/slave1/workspace/Launch-Environment-Flexy/private-openshift-ansible/playbooks/byo/config.retry PLAY RECAP ********************************************************************* ec2-52-207-222-117.compute-1.amazonaws.com : ok=346 changed=101 unreachable=0 failed=1 ec2-52-90-219-39.compute-1.amazonaws.com : ok=143 changed=43 unreachable=0 failed=0 ec2-54-164-208-60.compute-1.amazonaws.com : ok=143 changed=43 unreachable=0 failed=0 ec2-54-165-53-104.compute-1.amazonaws.com : ok=189 changed=61 unreachable=0 failed=0 ec2-54-172-58-88.compute-1.amazonaws.com : ok=189 changed=61 unreachable=0 failed=0 ec2-54-234-204-42.compute-1.amazonaws.com : ok=331 changed=117 unreachable=0 failed=0 localhost : ok=14 changed=8 unreachable=0 failed=0 Checking the nodes status after the failed installation. # oc get nodes NAME STATUS AGE ip-172-18-10-178.ec2.internal Ready 53m ip-172-18-10-183.ec2.internal Ready 53m ip-172-18-6-179.ec2.internal Ready 53m ip-172-18-7-238.ec2.internal Ready 53m ip-172-18-7-239.ec2.internal Ready 53m Expected results: Installtion successed. Additional info: