Bug 1730065

Summary: 3.10 to 3.11 upgrade fails on openshift_web_console ansible Task
Product: OpenShift Container Platform Reporter: Sam Yangsao <syangsao>
Component: Cluster Version OperatorAssignee: Russell Teague <rteague>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, jokerman, mmccomas, rteague, wmeng
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Improve error handling for tasks waiting for web console deployment.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-13 14:09:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Russell Teague 2019-07-15 19:34:59 UTC
This is potentially a transient error caused by the api server being overloaded.

"stderr": "Error from server (TooManyRequests): the server has received too many requests and has asked us to try again later (get deployments.extensions webconsole)\\n"

The error in the log was due to insufficient 'until' condition handling on the previous task, "Verify that the console is running".

Comment 5 Sam Yangsao 2019-07-15 19:48:30 UTC
(In reply to Russell Teague from comment #4)
> This is potentially a transient error caused by the api server being
> overloaded.
> 
> "stderr": "Error from server (TooManyRequests): the server has received too
> many requests and has asked us to try again later (get
> deployments.extensions webconsole)\\n"
> 
> The error in the log was due to insufficient 'until' condition handling on
> the previous task, "Verify that the console is running".

The webconsole and console seem to be working properly.  Is it OK to continue and proceed with the node upgrade?

Thanks!

Comment 6 Sam Yangsao 2019-07-15 20:18:18 UTC
(In reply to Sam Yangsao from comment #5)
> (In reply to Russell Teague from comment #4)
> > This is potentially a transient error caused by the api server being
> > overloaded.
> > 
> > "stderr": "Error from server (TooManyRequests): the server has received too
> > many requests and has asked us to try again later (get
> > deployments.extensions webconsole)\\n"
> > 
> > The error in the log was due to insufficient 'until' condition handling on
> > the previous task, "Verify that the console is running".
> 
> The webconsole and console seem to be working properly.  Is it OK to
> continue and proceed with the node upgrade?
> 
> Thanks!

Ignore this question, we'll need the control plane upgraded completely first from chatting with another team member prior to proceeding with the node upgrade.  

Still need some guidance on the original error and how to proceed.  Thanks!

Comment 7 Russell Teague 2019-07-15 21:01:00 UTC
The PR referenced above was opened to better handle situations where the API server was busy causing the 'TooManyRequests' error.

The upgrade is not complete because the web console is not the last item to be upgraded in the upgrade process.  The upgrade could be attempted again to complete the controle plane upgrade.

Prior to running the control plane upgrade, you can also verify the web console upgrade will complete by running just the web console playbook, playbooks/openshift-web-console/config.yml.  Running this playbook will install/upgrade the web console to the openshift version specified in the inventory.

Comment 8 Sam Yangsao 2019-07-16 16:22:03 UTC
Just a heads up, customer re-ran the `upgrade_control_plane.yml` playbook and confirmed that the playbook completed.  They are now going through the node upgrade playbook.  Thanks!

Comment 10 Weihua Meng 2019-08-08 08:20:04 UTC
Fixed.

openshift-ansible-3.11.135-1.git.0.b7ad55a.el7

Upgrade success, no error.

Comment 12 errata-xmlrpc 2019-08-13 14:09:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2352