Bug 2221643 - fence_ibm_powervs fencing agent performance enhancements needed (RHEL9)
Summary: fence_ibm_powervs fencing agent performance enhancements needed (RHEL9)
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: fence-agents
Version: 9.3
Hardware: ppc64le
OS: Linux
unspecified
low
Target Milestone: rc
: 9.3
Assignee: Oyvind Albrigtsen
QA Contact: Brandon Perkins
URL:
Whiteboard:
Depends On: 2155453
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-10 13:26 UTC by Oyvind Albrigtsen
Modified: 2023-08-10 15:39 UTC (History)
6 users (show)

Fixed In Version: fence-agents-4.10.0-48.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2155453
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CLUSTERQE-6807 0 None None None 2023-07-10 14:21:46 UTC
Red Hat Issue Tracker RHELPLAN-161860 0 None None None 2023-07-10 13:29:18 UTC
Red Hat Issue Tracker RHELPLAN-161861 0 None None None 2023-07-10 13:29:21 UTC

Description Oyvind Albrigtsen 2023-07-10 13:26:54 UTC
+++ This bug was initially created as a clone of Bug #2155453 +++

Description of problem:
The current version of the fence agent fence_ibm_powervs uses --method=onoff for the action reboot. The current version leads to a twice as long failover time compared to the new proposal.
 
Version-Release number of selected component (if applicable):
File: https://github.com/ClusterLabs/fence-agents/blob/main/agents/ibm_powervs/fence_ibm_powervs.py
Change Date: Oct 25 2022 
Tested Version: https://github.com/ClusterLabs/fence-agents/blob/3373431dc49d6e429bbf613765385cb33a56e917/agents/ibm_powervs/fence_ibm_powervs.py
 
How reproducible:
always
 
Steps to Reproduce:
1.	Deploy two PowerVS LPARs for SAP HANA using RHEL8.4 image
see https://cloud.ibm.com/docs/power-iaas?topic=power-iaas-creating-power-virtual-server 
and https://cloud.ibm.com/docs/sap?topic=sap-hana-iaas-offerings-profiles-power-vs  
2.	Install SAP HANA 2.0 on both nodes 
see https://help.sap.com/docs/SAP_HANA_PLATFORM/2c1988d620e04368aa4103bf26f17727/7eb0167eb35e4e2885415205b8383584.html?locale=en-US
3.	Setup SAP HANA System Replication and its pacemaker HSR cluster policy 
see https://access.redhat.com/articles/3004101
4.	Create PowerVS fence agent with command:
pcs stonith create fence_device fence_ibm_powervs token=${APIKEY} crn=${IBMCLOUD_CRN} instance=${GUID} region=${CLOUD_REGION} api-type=private proxy=http://${PROXY_IP}:3128  pcmk_host_map="${NODE1}:${POWERVSI_01};${NODE2}:${POWERVSI_02}" pcmk_reboot_timeout=600 pcmk_monitor_timeout=600
5.	Delay reboot with GRUB_TIMEOUT=3600 in /etc/default/grub and “grub2-mkconfig -o /boot/grub2/grub.cfg” (set pw for console user before reboot)
6.	reboot primary HSR node using “sync; echo b > /proc/sysrq-trigger”
 
Actual results:
Messages like the following will be shown:
Node List:
  * Node sap-ha-s1-1: online:
    * Resources:
      * fence_device    (stonith:fence_ibm_powervs):     Started (Monitoring)
      * SAPHanaTopology_ASD_00  (ocf::heartbeat:SAPHanaTopology):        Started
      * SAPHana_ASD_00  (ocf::heartbeat:SAPHana):        Slave
  * Node sap-ha-s1-2: UNCLEAN (offline):
    * Resources:
      * fence_device    (stonith:fence_ibm_powervs):     Started (Monitoring)
      * SAPHanaTopology_ASD_00  (ocf::heartbeat:SAPHanaTopology):        Started
      * vip_ASD_00      (ocf::heartbeat:IPaddr2):        Started
      * SAPHana_ASD_00  (ocf::heartbeat:SAPHana):        Master
-> Node sap-ha-s1-2 is flagged UNCLEAN and Node sap-ha-s1-1 does takeover after min. 2 minutes.
 
Expected results:
-> Node sap-ha-s1-2 should be flagged (offline) and Node sap-ha-s1-1 should takeover VIP and HANA Primary in less than 2 minutes (Most cases 1 minute).
 
Additional info:
https://github.com/ClusterLabs/fence-agents/pull/542


Note You need to log in before you can comment on or make changes to this bug.