Description of problem: Upgrade an old version v3.9 to latest v3.9 with openshift_release set in hosts file. Upgrade failed at task [openshift_web_console : Verify that the web console is running] when install web console. TASK [openshift_web_console : Verify that the web console is running] ********** task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:158 FAILED - RETRYING: Verify that the web console is running (60 retries left). .... FAILED - RETRYING: Verify that the web console is running (1 retries left). fatal: [x.x.x.x]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://webconsole.openshift-web-console.svc/healthz"], "delta": "0:00:01.013142", "end": "2018-02-22 03:51:26.238385", "msg": "non-zero return code", "rc": 7, "start": "2018-02-22 03:51:25.225243", "stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused", "stderr_lines": [" % Total % Received % Xferd Average Speed Time Time Time Current", " Dload Upload Total Spent Left Speed", "", " 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused"], "stdout": "", "stdout_lines": []} ...ignoring ... ... TASK [openshift_web_console : Report console errors] *************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:211 fatal: [x.x.x.x]: FAILED! => {"changed": false, "msg": "Console install failed."} ==debug info When upgrade failed, run "curl -k https://webconsole.openshift-web-console.svc/healthz" manually on the master hosts, get the right reply. # curl -k https://webconsole.openshift-web-console.svc/healthz ok Version-Release number of the following components: openshift-ansible-3.9.0-0.47.0.git.0.f8847bb.el7.noarch How reproducible: always Steps to Reproduce: 1. Install an old verson v3.9 ocp 2. Run upgrade against above ocp with openshift_release specified in hosts file(workaround for bz1547898) openshift_release=v3.9 3. Actual results: Upgrade failed. Expected results: Upgrade succeed. Additional info: TASK [openshift_web_console : Verify that the web console is running] ********** task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:158 FAILED - RETRYING: Verify that the web console is running (60 retries left). .... FAILED - RETRYING: Verify that the web console is running (1 retries left). fatal: [x.x.x.x]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://webconsole.openshift-web-console.svc/healthz"], "delta": "0:00:01.013142", "end": "2018-02-22 03:51:26.238385", "msg": "non-zero return code", "rc": 7, "start": "2018-02-22 03:51:25.225243", "stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused", "stderr_lines": [" % Total % Received % Xferd Average Speed Time Time Time Current", " Dload Upload Total Spent Left Speed", "", " 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused"], "stdout": "", "stdout_lines": []} ...ignoring TASK [openshift_web_console : Check status in the openshift-web-console namespace] *** task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:176 changed: [x.x.x.x] => {"changed": true, "cmd": ["oc", "status", "--config=/tmp/console-ansible-ShiWVE/admin.kubeconfig", "-n", "openshift-web-console"], "delta": "0:00:00.773984", "end": "2018-02-22 03:51:28.249952", "rc": 0, "start": "2018-02-22 03:51:27.475968", "stderr": "", "stderr_lines": [], "stdout": "In project openshift-web-console on server https://qe-jliu-39-master-etcd-1:8443\n\nsvc/webconsole - 172.30.148.25:443 -> 8443\n deployment/webconsole deploys registry.reg-aws.openshift.com:443/openshift3/ose-web-console:v3.9.0\n\nView details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.", "stdout_lines": ["In project openshift-web-console on server https://qe-jliu-39-master-etcd-1:8443", "", "svc/webconsole - 172.30.148.25:443 -> 8443", " deployment/webconsole deploys registry.reg-aws.openshift.com:443/openshift3/ose-web-console:v3.9.0", "", "View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'."]} TASK [openshift_web_console : debug] ******************************************* task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:181 ok: [x.x.x.x] => { "msg": [ "In project openshift-web-console on server https://qe-jliu-39-master-etcd-1:8443", "", "svc/webconsole - 172.30.148.25:443 -> 8443", " deployment/webconsole deploys registry.reg-aws.openshift.com:443/openshift3/ose-web-console:v3.9.0", "", "View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'." ] } TASK [openshift_web_console : Get pods in the openshift-web-console namespace] *** task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:183 changed: [x.x.x.x] => {"changed": true, "cmd": ["oc", "get", "pods", "--config=/tmp/console-ansible-ShiWVE/admin.kubeconfig", "-n", "openshift-web-console", "-o", "wide"], "delta": "0:05:00.638511", "end": "2018-02-22 03:56:30.187713", "rc": 0, "start": "2018-02-22 03:51:29.549202", "stderr": "", "stderr_lines": [], "stdout": "NAME READY STATUS RESTARTS AGE IP NODE\nwebconsole-6fcb5b98f6-5hgkl 1/1 Running 0 15m 10.128.0.16 qe-jliu-39-master-etcd-1", "stdout_lines": ["NAME READY STATUS RESTARTS AGE IP NODE", "webconsole-6fcb5b98f6-5hgkl 1/1 Running 0 15m 10.128.0.16 qe-jliu-39-master-etcd-1"]} TASK [openshift_web_console : debug] ******************************************* task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:188 ok: [x.x.x.x] => { "msg": [ "NAME READY STATUS RESTARTS AGE IP NODE", "webconsole-6fcb5b98f6-5hgkl 1/1 Running 0 15m 10.128.0.16 qe-jliu-39-master-etcd-1" ] } TASK [openshift_web_console : Report console errors] *************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:211 fatal: [x.x.x.x]: FAILED! => {"changed": false, "msg": "Console install failed."}
It looks like you might not have included all of the output after the "Verify that the web console is running" task. Is there output with the pod logs? If not, can you include the log from...? oc logs webconsole-6fcb5b98f6-5hgkl -n openshift-web-console I don't see anything obviously wrong. It's odd that the curl failed when the pod is running and ready.
I just noticed that you ran the curl manually, and it worked. I wonder if the pod simply took more than 5 minutes to become ready. cc Scott
The oc status output normally has a line like deployment #1 running for 16 hours - 3 pods that's missing here. If I'm reading it right, the call to get `oc get pods` took 5 minutes (!), so it's possible that the pod became ready in between the curl check and that command output, particularly if manually running curl later worked.
jiajliu - If you still have the ansible output, please include the events that were printed as well. Since the events will have expired at this point, you will only be able to find those in the ansible output.
@Samuel Padgett I've attached all output from PLAY [Upgrade web console] to the end. Hope it helps.
Thanks for the log. It looks like we simply aren't waiting long enough for the console to be ready. From the events, the masters were not schedulable for a few minutes after the console was deployed. Since the masters were just recently upgraded and the console pod runs on the masters, it's possible the container will take more than 5 minutes to be ready, in particular if it needs to pull an image (not the case here, but possible). Note that the curl same command succeeded when run manually later. Given that deployment configs default to 10 minutes before failing, it seems reasonable to use the same for openshift-ansible. https://github.com/openshift/openshift-ansible/pull/7266 Please try again when these changes to go in and reopen if you still see the failure.
Verified on openshift-ansible-3.9.0-0.53.0.git.0.f8f01ef.el7.noarch