RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1855888 - SAPHana: check_for_primary() uses mode instead of actual mode in global.ini as fallback [RHEL 7]
Summary: SAPHana: check_for_primary() uses mode instead of actual mode in global.ini a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents
Version: 7.8
Hardware: All
OS: Linux
high
urgent
Target Milestone: rc
: 7.9
Assignee: Frank Danapfel
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1855885
Blocks: 1918784 1918786 1943756
TreeView+ depends on / blocked
 
Reported: 2020-07-10 20:05 UTC by Reid Wahl
Modified: 2024-12-20 19:09 UTC (History)
19 users (show)

Fixed In Version: resource-agents-4.1.1-61.el7_9.13
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1855885
: 1918784 1918786 1943756 (view as bug list)
Environment:
Last Closed: 2021-08-31 09:11:01 UTC
Target Upstream Version:
Embargoed:
aarnold: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SUSE/SAPHanaSR/commit/ec9fd4e5 0 None None None 2021-03-27 05:10:58 UTC
Github SUSE SAPHanaSR issues 40 0 None open Use 'actual_mode' instead of 'mode' when global.ini is used as fallback to detect replication mode 2021-02-09 16:28:38 UTC
Red Hat Knowledge Base (Solution) 4657331 0 None None None 2020-07-18 06:50:55 UTC
Red Hat Product Errata RHBA-2021:3332 0 None None None 2021-08-31 09:11:11 UTC

Description Reid Wahl 2020-07-10 20:05:13 UTC
+++ This bug was initially created as a clone of Bug #1855885 +++

Description of problem:

The SAPHana resource agent uses the `system_replication/mode` attribute from global.ini as a fallback if the `$hdbState` command fails. The expectation is that a takeover event updates the `mode` parameter so that it's usually a valid representation of which node is currently primary.

However, `mode` is a static parameter that does not change with a takeover event. Instead, the takeover updates the `actual mode` parameter.

Our resource agent needs to be updated to query the correct parameter in the event that `$hdbState` fails. This way, we can respond more appropriately to edge-case situations like missing hdb* binaries.


Adapted from an SAP engineer in a support collaboration email:
~~~
# # Before takeover
# # node1 is primary, node2 is secondary
	
global.ini on node1:
  mode  = primary
  actual mode = primary
  operation_mode = logreplay

global.ini on node2:
  mode  = sync
  actual mode = sync
  operation_mode = logreplay

hdbnsutil -sr_state on node1:
  mode: primary
  operation mode: primary

hdbnsutil -sr_state on node2:
  mode: sync
  operation mode: logreplay


# # After takeover/failover
# # node1 is secondary, node2 is primary

global.ini on node1:
  mode  = primary
  actual mode = sync
  operation_mode = logreplay

global.ini on node2:
  mode  = sync
  actual mode = primary
  operation_mode = logreplay

hdbnsutil -sr_state on node1:
  mode: sync
  operation mode: logreplay

hdbnsutil -sr_state on node2:
  mode: primary
  operation mode: primary


Just have a look how we change the parameter values depending on the source – global.ini or hdbnsutil -sr_state. I highlighted major differences.

This behavior doesn’t depend on removing binaries, it’s normal HANA parameter change after the takeover/re-registering secondaries to new primary. Confirmation is provided in SAP Note 1999880 - FAQ: SAP HANA System Replication:

47. Why do I see deviating values in the system replication mode parameter?

The value in parameter global.ini -> [system_replication] -> mode depends on the original role of the site:

·         If site was originally configured as primary site: mode = 'primary'

·         If site was originally configured as secondary / tertiary site: mode = 'sync', 'async', ... (dependent on the system replication mode)

As a consequence the mode value can be different in two identically configured systems if a takeover happened in one system, but not in the other.
~~~

-----

Version-Release number of selected component (if applicable):

resource-agents-sap-hana-4.1.1-53.el7

-----

How reproducible:

Always

-----

Steps to Reproduce:

Assuming SAP's description of the parameters is correct, it's trivial to look at the check_for_primary() function and see that we're apparently querying the wrong one.

I believe the following steps will reproduce an issue occurring as a result of this:

1. Make the node with `mode = sync` the primary node via takeover.
2. Move the hdb* binaries to another location on that node so that the binaries are "missing."

-----

Actual results:

Both nodes end up in demoted state because the RA reads "mode = sync" from the global.ini file as a fallback on the primary.

-----

Expected results:

Pacemaker does not take any corrective action because the RA reads "actual mode = primary" from the global.ini file as a fallback.

-----

Additional info:

Related to closed BZ1783581.

--- Additional comment from Reid Wahl on 2020-07-10 20:02:01 UTC ---

I don't think this is an issue in the Scale Out RAs, but it wouldn't hurt to confirm.

Comment 9 Chris Williams 2020-11-11 21:50:06 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7

Comment 34 errata-xmlrpc 2021-08-31 09:11:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (resource-agents bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3332


Note You need to log in before you can comment on or make changes to this bug.