Bug 2221643

Summary: fence_ibm_powervs fencing agent performance enhancements needed (RHEL9)
Product: Red Hat Enterprise Linux 9 Reporter: Oyvind Albrigtsen <oalbrigt>
Component: fence-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: MODIFIED --- QA Contact: Brandon Perkins <bperkins>
Severity: low Docs Contact:
Priority: unspecified    
Version: 9.3CC: andreas.schauberer, bperkins, cfeist, cluster-maint, fdanapfe, ksatarin
Target Milestone: rcKeywords: Triaged
Target Release: 9.3   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: fence-agents-4.10.0-48.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2155453 Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2155453    
Bug Blocks:    

Description Oyvind Albrigtsen 2023-07-10 13:26:54 UTC
+++ This bug was initially created as a clone of Bug #2155453 +++

Description of problem:
The current version of the fence agent fence_ibm_powervs uses --method=onoff for the action reboot. The current version leads to a twice as long failover time compared to the new proposal.
 
Version-Release number of selected component (if applicable):
File: https://github.com/ClusterLabs/fence-agents/blob/main/agents/ibm_powervs/fence_ibm_powervs.py
Change Date: Oct 25 2022 
Tested Version: https://github.com/ClusterLabs/fence-agents/blob/3373431dc49d6e429bbf613765385cb33a56e917/agents/ibm_powervs/fence_ibm_powervs.py
 
How reproducible:
always
 
Steps to Reproduce:
1.	Deploy two PowerVS LPARs for SAP HANA using RHEL8.4 image
see https://cloud.ibm.com/docs/power-iaas?topic=power-iaas-creating-power-virtual-server 
and https://cloud.ibm.com/docs/sap?topic=sap-hana-iaas-offerings-profiles-power-vs  
2.	Install SAP HANA 2.0 on both nodes 
see https://help.sap.com/docs/SAP_HANA_PLATFORM/2c1988d620e04368aa4103bf26f17727/7eb0167eb35e4e2885415205b8383584.html?locale=en-US
3.	Setup SAP HANA System Replication and its pacemaker HSR cluster policy 
see https://access.redhat.com/articles/3004101
4.	Create PowerVS fence agent with command:
pcs stonith create fence_device fence_ibm_powervs token=${APIKEY} crn=${IBMCLOUD_CRN} instance=${GUID} region=${CLOUD_REGION} api-type=private proxy=http://${PROXY_IP}:3128  pcmk_host_map="${NODE1}:${POWERVSI_01};${NODE2}:${POWERVSI_02}" pcmk_reboot_timeout=600 pcmk_monitor_timeout=600
5.	Delay reboot with GRUB_TIMEOUT=3600 in /etc/default/grub and “grub2-mkconfig -o /boot/grub2/grub.cfg” (set pw for console user before reboot)
6.	reboot primary HSR node using “sync; echo b > /proc/sysrq-trigger”
 
Actual results:
Messages like the following will be shown:
Node List:
  * Node sap-ha-s1-1: online:
    * Resources:
      * fence_device    (stonith:fence_ibm_powervs):     Started (Monitoring)
      * SAPHanaTopology_ASD_00  (ocf::heartbeat:SAPHanaTopology):        Started
      * SAPHana_ASD_00  (ocf::heartbeat:SAPHana):        Slave
  * Node sap-ha-s1-2: UNCLEAN (offline):
    * Resources:
      * fence_device    (stonith:fence_ibm_powervs):     Started (Monitoring)
      * SAPHanaTopology_ASD_00  (ocf::heartbeat:SAPHanaTopology):        Started
      * vip_ASD_00      (ocf::heartbeat:IPaddr2):        Started
      * SAPHana_ASD_00  (ocf::heartbeat:SAPHana):        Master
-> Node sap-ha-s1-2 is flagged UNCLEAN and Node sap-ha-s1-1 does takeover after min. 2 minutes.
 
Expected results:
-> Node sap-ha-s1-2 should be flagged (offline) and Node sap-ha-s1-1 should takeover VIP and HANA Primary in less than 2 minutes (Most cases 1 minute).
 
Additional info:
https://github.com/ClusterLabs/fence-agents/pull/542