RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1480311 - Deleting a guestnode resource without a node removal may lead to node fencing
Summary: Deleting a guestnode resource without a node removal may lead to node fencing
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Tomas Jelinek
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-10 16:24 UTC by Radek Steiger
Modified: 2018-11-13 13:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-13 13:02:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1459503 0 urgent CLOSED OpenStack is not compatible with pcs management of remote and guest nodes 2021-02-22 00:41:40 UTC

Internal Links: 1459503

Description Radek Steiger 2017-08-10 16:24:18 UTC
> Description of problem:

In the upstream version of pcs we have a safety check to prevent deleting a guestnode resource without being removed from the nodelist first. In RHEL however it has only become a warning:

"Warning: This command is not sufficient for removing remote and guest nodes. To complete the removal, remove pacemaker authkey and stop and disable pacemaker_remote on the node(s) manually."

The problem is that in rare cases this can lead into a stonith action, so we might want to make it an error instead and require --force flag  so the user gets full responsibility for his actions.

It is possible that a proper solution/fix would be more suitable on pacemaker side though.


> Version-Release number of selected component (if applicable):

pcs-0.9.158-6.el7.x86_64
pacemaker-1.1.16-12.el7.x86_64


> How reproducible:

Rarely.


> Steps to Reproduce:

Run a loop on one of the cluster nodes where GUEST is the hostname of the guest node being added and removed periodically. Also make sure fencing is configured properly for all nodes including the said guest node.

The qarsh command is optional only to break out of the loop when the guest node dies. Also a Dummy resource isn't the right resource for a guest node, but should be enough to get the idea here.

i=0; while [ $? -eq 0 ]; do 
  let i+=1;  echo PASS $i;
  /usr/bin/qarsh -l root -t 5 GUEST "uptime" || break;
  pcs resource create GuestResource ocf:heartbeat:Dummy  --disabled;
  pcs cluster node add-guest GUEST GuestResource;
  pcs resource delete GuestResource;
done

Note: Adding sleeps between actions makes absolutely no difference.


> Actual results:

It may take a few dozens of attempts to reproduce but the guest node gets fenced eventually.


> Expected results:

Get an error when trying to delete a guest node resource that hasn't been handled by running 'pcs cluster node remove-guest'. Or ideally fix the real race condition that leads to fencing (which is probably not on pcs side).

Comment 1 Tomas Jelinek 2018-11-13 13:02:51 UTC
We cannot make this a forcible error because that would break OpenStack, see bz1459503. The correct way to remove a guest node is to use the 'pcs cluster node remove-guest' command which should be working correctly.


Note You need to log in before you can comment on or make changes to this bug.