Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1547923 - Upgrade failed at task [Verify that the web console is running] during minor version upgrade
Upgrade failed at task [Verify that the web console is running] during minor ...
Status: CLOSED CURRENTRELEASE
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console (Show other bugs)
3.9.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.9.0
Assigned To: Samuel Padgett
liujia
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-02-22 05:00 EST by liujia
Modified: 2018-03-27 05:49 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-02-28 08:04:04 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.9 RPM Release Advisory 2018-03-28 14:06:38 EDT

  None (edit)
Description liujia 2018-02-22 05:00:04 EST
Description of problem:
Upgrade an old version v3.9 to latest v3.9 with openshift_release set in hosts file. Upgrade failed at task [openshift_web_console : Verify that the web console is running] when install web console.

TASK [openshift_web_console : Verify that the web console is running] **********
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:158
FAILED - RETRYING: Verify that the web console is running (60 retries left).
....
FAILED - RETRYING: Verify that the web console is running (1 retries left).
fatal: [x.x.x.x]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://webconsole.openshift-web-console.svc/healthz"], "delta": "0:00:01.013142", "end": "2018-02-22 03:51:26.238385", "msg": "non-zero return code", "rc": 7, "start": "2018-02-22 03:51:25.225243", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused"], "stdout": "", "stdout_lines": []}
...ignoring

...
...

TASK [openshift_web_console : Report console errors] ***************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:211
fatal: [x.x.x.x]: FAILED! => {"changed": false, "msg": "Console install failed."}

==debug info
When upgrade failed, run "curl -k https://webconsole.openshift-web-console.svc/healthz" manually on the master hosts, get the right reply.

# curl -k https://webconsole.openshift-web-console.svc/healthz
ok


Version-Release number of the following components:
openshift-ansible-3.9.0-0.47.0.git.0.f8847bb.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Install an old verson v3.9 ocp
2. Run upgrade against above ocp with openshift_release specified in hosts file(workaround for bz1547898)
openshift_release=v3.9
3.

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
TASK [openshift_web_console : Verify that the web console is running] **********
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:158
FAILED - RETRYING: Verify that the web console is running (60 retries left).
....
FAILED - RETRYING: Verify that the web console is running (1 retries left).
fatal: [x.x.x.x]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://webconsole.openshift-web-console.svc/healthz"], "delta": "0:00:01.013142", "end": "2018-02-22 03:51:26.238385", "msg": "non-zero return code", "rc": 7, "start": "2018-02-22 03:51:25.225243", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connection refused"], "stdout": "", "stdout_lines": []}
...ignoring

TASK [openshift_web_console : Check status in the openshift-web-console namespace] ***
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:176
changed: [x.x.x.x] => {"changed": true, "cmd": ["oc", "status", "--config=/tmp/console-ansible-ShiWVE/admin.kubeconfig", "-n", "openshift-web-console"], "delta": "0:00:00.773984", "end": "2018-02-22 03:51:28.249952", "rc": 0, "start": "2018-02-22 03:51:27.475968", "stderr": "", "stderr_lines": [], "stdout": "In project openshift-web-console on server https://qe-jliu-39-master-etcd-1:8443\n\nsvc/webconsole - 172.30.148.25:443 -> 8443\n  deployment/webconsole deploys registry.reg-aws.openshift.com:443/openshift3/ose-web-console:v3.9.0\n\nView details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.", "stdout_lines": ["In project openshift-web-console on server https://qe-jliu-39-master-etcd-1:8443", "", "svc/webconsole - 172.30.148.25:443 -> 8443", "  deployment/webconsole deploys registry.reg-aws.openshift.com:443/openshift3/ose-web-console:v3.9.0", "", "View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'."]}

TASK [openshift_web_console : debug] *******************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:181
ok: [x.x.x.x] => {
    "msg": [
        "In project openshift-web-console on server https://qe-jliu-39-master-etcd-1:8443", 
        "", 
        "svc/webconsole - 172.30.148.25:443 -> 8443", 
        "  deployment/webconsole deploys registry.reg-aws.openshift.com:443/openshift3/ose-web-console:v3.9.0", 
        "", 
        "View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'."
    ]
}

TASK [openshift_web_console : Get pods in the openshift-web-console namespace] ***
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:183
changed: [x.x.x.x] => {"changed": true, "cmd": ["oc", "get", "pods", "--config=/tmp/console-ansible-ShiWVE/admin.kubeconfig", "-n", "openshift-web-console", "-o", "wide"], "delta": "0:05:00.638511", "end": "2018-02-22 03:56:30.187713", "rc": 0, "start": "2018-02-22 03:51:29.549202", "stderr": "", "stderr_lines": [], "stdout": "NAME                          READY     STATUS    RESTARTS   AGE       IP            NODE\nwebconsole-6fcb5b98f6-5hgkl   1/1       Running   0          15m       10.128.0.16   qe-jliu-39-master-etcd-1", "stdout_lines": ["NAME                          READY     STATUS    RESTARTS   AGE       IP            NODE", "webconsole-6fcb5b98f6-5hgkl   1/1       Running   0          15m       10.128.0.16   qe-jliu-39-master-etcd-1"]}

TASK [openshift_web_console : debug] *******************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:188
ok: [x.x.x.x] => {
    "msg": [
        "NAME                          READY     STATUS    RESTARTS   AGE       IP            NODE", 
        "webconsole-6fcb5b98f6-5hgkl   1/1       Running   0          15m       10.128.0.16   qe-jliu-39-master-etcd-1"
    ]
}

TASK [openshift_web_console : Report console errors] ***************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/install.yml:211
fatal: [x.x.x.x]: FAILED! => {"changed": false, "msg": "Console install failed."}
Comment 1 Samuel Padgett 2018-02-22 09:48:29 EST
It looks like you might not have included all of the output after the "Verify that the web console is running" task. Is there output with the pod logs?

If not, can you include the log from...?

oc logs webconsole-6fcb5b98f6-5hgkl -n openshift-web-console

I don't see anything obviously wrong. It's odd that the curl failed when the pod is running and ready.
Comment 2 Samuel Padgett 2018-02-22 09:57:03 EST
I just noticed that you ran the curl manually, and it worked. I wonder if the pod simply took more than 5 minutes to become ready.

cc Scott
Comment 3 Samuel Padgett 2018-02-22 10:06:08 EST
The oc status output normally has a line like

deployment #1 running for 16 hours - 3 pods

that's missing here. If I'm reading it right, the call to get `oc get pods` took 5 minutes (!), so it's possible that the pod became ready in between the curl check and that command output, particularly if manually running curl later worked.
Comment 4 Samuel Padgett 2018-02-22 11:42:10 EST
jiajliu@redhat.com - If you still have the ansible output, please include the events that were printed as well. Since the events will have expired at this point, you will only be able to find those in the ansible output.
Comment 6 liujia 2018-02-22 20:58:31 EST
@Samuel Padgett

I've attached all output from PLAY [Upgrade web console] to the end. Hope it helps.
Comment 7 Samuel Padgett 2018-02-23 10:40:13 EST
Thanks for the log.

It looks like we simply aren't waiting long enough for the console to be ready. From the events, the masters were not schedulable for a few minutes after the console was deployed. Since the masters were just recently upgraded and the console pod runs on the masters, it's possible the container will take more than 5 minutes to be ready, in particular if it needs to pull an image (not the case here, but possible). Note that the curl same command succeeded when run manually later.

Given that deployment configs default to 10 minutes before failing, it seems reasonable to use the same for openshift-ansible.

https://github.com/openshift/openshift-ansible/pull/7266

Please try again when these changes to go in and reopen if you still see the failure.
Comment 12 liujia 2018-02-27 01:46:01 EST
Verified on openshift-ansible-3.9.0-0.53.0.git.0.f8f01ef.el7.noarch

Note You need to log in before you can comment on or make changes to this bug.