Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1752088 - Recovery e2e test hangs when cloud-controller pod is on machine that is chosen for deletion
Summary: Recovery e2e test hangs when cloud-controller pod is on machine that is chose...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Mike Fedosin
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-13 17:02 UTC by Clayton Coleman
Modified: 2019-10-16 06:41 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:41:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-aws pull 260 0 None closed Bug 1752088: UPSTREAM: <carry>: openshift: Revendor to bring https://github.com/openshift/cluster-api/pull/72 2020-09-27 07:28:08 UTC
Github openshift cluster-api-provider-azure pull 83 0 None closed Bug 1752088: UPSTREAM: <carry>: openshift: Revendor to bring https://github.com/openshift/cluster-api/pull/72 2020-09-27 07:28:08 UTC
Github openshift cluster-api-provider-gcp pull 62 0 None closed Bug 1752088: UPSTREAM: <carry>: openshift: Revendor to bring https://github.com/openshift/cluster-api/pull/72 2020-09-27 07:28:08 UTC
Github openshift cluster-api-provider-openstack pull 67 0 None closed Bug 1752088: UPSTREAM: <carry>: openshift: Revendor to bring https://github.com/openshift/cluster-api/pull/72 2020-09-27 07:28:08 UTC
Github openshift cluster-api pull 49 0 None closed Bug 1752088: Refactor isDeleteAllowed to remove most logic 2020-09-27 07:28:08 UTC
Github openshift cluster-api pull 72 0 None closed Bug 1752088: UPSTREAM: <carry>: openshift: Drop isDeleteAllowed func 2020-09-27 07:28:07 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:41:23 UTC

Description Clayton Coleman 2019-09-13 17:02:27 UTC
I am currently trying to get the disruptive tests to run always, however the naive implementation in the test uses the machine api to select and delete machines (which in 4.3 or 4.4 the etcd operator will help automatically recover).

However, sometimes the test hangs because of code in the cloud machine controller

I0913 16:51:22.363617       1 controller.go:203] Deleting machine hosting this controller is not allowed. Skipping reconciliation of machine "ci-ln-wbhfl5t-d5d6b-2jqb5-master-0"

Mike indicated that this is just code from upstream that is unnecessary and has a PR to fix.  If we can merge that safely for 4.2 that unblocks the recovery test being implemented using the simple path (and is probably safer in the long run, since quorum guard already protects masters).

Would like to see the PR https://github.com/openshift/cluster-api/pull/49 merged in 4.2 so we can unblock.

Comment 1 Michael Gugino 2019-09-13 17:05:51 UTC
PR to address: https://github.com/openshift/cluster-api/pull/49

Would need to be vendored into all actuators.

Comment 3 Alberto 2019-09-17 12:49:59 UTC
Merged on aws/gcp. Waiting for tests to go green on Azure.
PR pending of approval on Openstack so assigning now to mfedosin@redhat.com for awareness

Comment 4 Alberto 2019-09-17 13:35:35 UTC
All PRs merged

Comment 6 Jianwei Hou 2019-09-19 06:55:41 UTC
Verified on 4.2.0-0.nightly-2019-09-18-114152

Deleting a machine that is hosting the machine-controller is now allowed.

Comment 7 errata-xmlrpc 2019-10-16 06:41:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.