Bug 1739306

Summary: Pre-provisioned nodes with fencing requires host_mac
Product: Red Hat OpenStack Reporter: David Vallee Delisle <dvd>
Component: documentationAssignee: Mikey Ariel <mariel>
Status: CLOSED CURRENTRELEASE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: aschultz, ddf-bot, jjoyce, jschluet, lmiccini, mariel, slinaber, tvignaud, vcojot
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-07 15:13:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1734843    

Description David Vallee Delisle 2019-08-09 02:30:40 UTC
Description of problem:
When using pre-provisioned nodes, it's not a given that the operator will keep an inventory of mac <-> ip of the overcloud nodes like ironic does.

Our OSP13 fencing docs [1] mentions that we need the mac address of each overcloud nodes to configure a fencing device.

When I read the RHEL7 fencing doc [2], it doesn't mention that MACs are required.

When I look at the code [3], I see that the MAC is used only as a unique identifier in the name of the fencing device, and not used at all in the fencing mechanism itself. So it doesn't look like the MAC is required to have a fencing device.

I believe it would be easier for operators to have the overcloud node names as a unique identifier here, unless I'm missing something.


Additional info:
[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/high_availability_for_compute_instances/index#instanceha-install

[2] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-fencedevicecreate-haar

[3] https://github.com/openstack/puppet-tripleo/blob/stable/queens/manifests/fencing.pp#L84-L89

Comment 2 David Vallee Delisle 2019-08-09 02:32:57 UTC
My colleague opened this doc bug 1734843 also.

Comment 3 Luca Miccini 2019-08-28 05:59:48 UTC
(In reply to David Vallee Delisle from comment #0)
> Description of problem:
> When using pre-provisioned nodes, it's not a given that the operator will
> keep an inventory of mac <-> ip of the overcloud nodes like ironic does.
> 
> Our OSP13 fencing docs [1] mentions that we need the mac address of each
> overcloud nodes to configure a fencing device.
> 
> When I read the RHEL7 fencing doc [2], it doesn't mention that MACs are
> required.
> 
> When I look at the code [3], I see that the MAC is used only as a unique
> identifier in the name of the fencing device, and not used at all in the
> fencing mechanism itself. So it doesn't look like the MAC is required to
> have a fencing device.
> 
> I believe it would be easier for operators to have the overcloud node names
> as a unique identifier here, unless I'm missing something.
> 
> 
> Additional info:
> [1]
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/
> html-single/high_availability_for_compute_instances/index#instanceha-install
> 
> [2]
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/
> html/high_availability_add-on_reference/s1-fencedevicecreate-haar
> 
> [3]
> https://github.com/openstack/puppet-tripleo/blob/stable/queens/manifests/
> fencing.pp#L84-L89

Hi David,

The MAC address primary use is to be the discriminator to match the node to the respective ipmi details.

If you look at the following code:

https://github.com/openstack/puppet-tripleo/blob/cd526608cdad090bed8cfc7a6bb65ca8aa6ae0e6/manifests/fencing.pp#L78

  $ipmilan_devices = local_fence_devices('fence_ipmilan', $all_devices)
  create_resources('pacemaker::stonith::fence_ipmilan', $ipmilan_devices, $common_params)

the fencing manifest first calls local_fence_devices (lib/puppet/parser/functions/local_fence_devices.rb) whose purpose is:

  newfunction(:local_fence_devices, :arity =>2, :type => :rvalue,
              :doc => ("Given an array of fence device configs, limit them" +
                       "to fence devices whose MAC address is present on" +
                       "some of the local NICs, and prepare a hash which can be" +
                       "passed to create_resources function")) do |args|

so it matches the MAC address specified in the yaml file against the ones detected on the server's interfaces via the puppet standard function 'function_has_interface_with':

    # filter by local mac address
    local_devices = agent_type_devices.select do |device|
      function_has_interface_with(['macaddress', device['host_mac']])
    end


As you can see currently the current fencing setup is very much depending on having the mac addresses of the servers specified in the yaml file.

Historically there hasn't been a better way to match ipmi details to servers. Without scheduling hints there is no guarantee that the "controller-0" ironic node would be ending up having the "controller-0" hostname, so using the MAC allowed us to match things in a safe way.

To get to your question regarding pre-provisioned nodes - you can use a manually generated yaml file containing what required (especially the mac address :) ), or configure stonith manually after the deployment (not recommended).

As for the name of the stonith resource: we wholeheartedly agree with you that a more descriptive name would have been better, we could try and have a look how much work would be to use something different without breaking backwards compatibility. Please let us know if you want us to scope this out (keep in mind that this would be a very low prio rfe).

Cheers,
Luca

Comment 4 Luca Miccini 2019-08-28 06:03:34 UTC
*** Bug 1734843 has been marked as a duplicate of this bug. ***

Comment 17 Mikey Ariel 2020-07-07 15:13:33 UTC
Since the updated content addresses the original issue and no further requests made, closing this bug as fixed. If the issue still occurs, please reopen this bug or create a new one.