Bug 1609907 - [free-int] Timeout waiting to approve node during control-plane upgrade
Summary: [free-int] Timeout waiting to approve node during control-plane upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.11.0
Assignee: Russell Teague
QA Contact: liujia
URL:
Whiteboard:
: 1612144 (view as bug list)
Depends On:
Blocks: 1614003 1656757
TreeView+ depends on / blocked
 
Reported: 2018-07-30 19:13 UTC by Justin Pierce
Modified: 2018-12-06 09:29 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Upgrade tasks include node CSR approval which is no longer necessary once nodes are at 3.10. Fix: The node approval task has been removed from the node upgrade playbook.
Clone Of:
: 1614003 1656757 (view as bug list)
Environment:
Last Closed: 2018-10-11 07:22:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:23:06 UTC

Description Justin Pierce 2018-07-30 19:13:52 UTC
Description of problem:

During an upgrade of the control-plane of a test cluster using 3.11.0-0.10.0:
11:26:55 fatal: [free-int-master-3c664 -> None]: FAILED! => {"changed": false, "finished": false, "msg": "Timed out accepting certificate signing requests. Failing as requested.", "nodes": [{"client_accepted": false, "csrs": {}, "denied": false, "name": "ip-172-31-50-177.ec2.internal", "server_accepted": false}], "results": [], "state": "approve", "timeout": true}



Version-Release number of the following components:
3.11.0-0.10.0 

How reproducible:
Retrying the operation produced the same result.

Steps to Reproduce:
1. Run control plane upgrade

Comment 2 Russell Teague 2018-07-30 19:18:46 UTC
When testing, skipping this task[1] would allow the upgrade to complete.

[1] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/tasks/upgrade/restart.yml#L48-L53

Comment 3 Scott Dodson 2018-07-31 19:26:49 UTC
We're quite certain that this will also happen during 3.10.z.

Comment 4 Russell Teague 2018-08-08 19:31:46 UTC
For 3.11: https://github.com/openshift/openshift-ansible/pull/9488

Comment 5 openshift-github-bot 2018-08-10 15:30:40 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/a1d87c62487f8d2454e8ddd759cddf8dd9233002
Merge pull request #9488 from mtnbikenc/fix-1609907

[Bug 1609907] Remove node CSR approval from upgrade in 3.11

Comment 6 Russell Teague 2018-08-13 12:44:48 UTC
*** Bug 1612144 has been marked as a duplicate of this bug. ***

Comment 7 Russell Teague 2018-08-13 12:45:35 UTC
openshift-ansible-3.11.0-0.14.0

Comment 8 liujia 2018-08-14 05:15:33 UTC
Verified on openshift-ansible-3.11.0-0.14.0.git.0.7bd4429None.noarch

Comment 9 Scott Dodson 2018-08-15 19:40:30 UTC
*** Bug 1612144 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2018-10-11 07:22:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.