Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2221772

Summary:

Automatic failover - Pacemaker 2.0 , on-fail=demote with sql server 3node replicas, stat-failure-is-fatal not being adhered to

Product:

Red Hat Enterprise Linux 8

Reporter:

Aravind Mahadevan <armaha>

Component:

pacemaker

Assignee:

Ken Gaillot <kgaillot>

Status:

CLOSED MIGRATED

QA Contact:

cluster-qe <cluster-qe>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

8.7

CC:

amitkh, cluster-maint, dyeisley, kgaillot, limershe

Target Milestone:

Keywords:

MigratedToJIRA, Triaged

Target Release:

---

Flags:

pm-rhel: mirror+

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2023-09-22 20:27:00 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Pacemaker logs	none
Pacemaker logs	none
Pacemaker logs	none
pacemaker logs_updated	none
pacemaker logs_updated	none
pacemaker logs_updated	none

Description Aravind Mahadevan 2023-07-10 18:16:40 UTC

Description of problem:
In RHEL 8.7, with the use of pacemaker 2.0+ in SQL Server Availability groups,
Automatic failover within the nodes keep happening.

With 3 replicas setup, when SQL is shutdown on primary , failover keeps happening between the replicas.

This is caused/observed by pacemaker breaking change after introducing "on-fail=demote" property.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Setup sql server 3 node replica in rhel 8.7 with pacemaker 2.0+
2. Stop SQL server in primary
3. Observe that pacemaker failovers HA resource back and forth , attempts to start sql server infinitely.

Actual results:


Expected results:


Additional info:

Comment 1 Ken Gaillot 2023-07-10 18:40:33 UTC

Hi,

Could you attach the output of "pcs cluster report" covering the time of one such incident? It sounds like a monitor is failing repeatedly.

Comment 2 Aravind Mahadevan 2023-07-17 05:21:55 UTC

Created attachment 1976118 [details]
Pacemaker logs

Comment 3 Aravind Mahadevan 2023-07-17 05:28:35 UTC

Pacemaker logs when we observed the issue : 

May 25 13:51:20  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:20 ag-helper invoked with required-synchronized-secondaries-to-commit [-1]; current-master [server3
May 25 13:51:20  ag(ag_cluster)[458795]:    INFO: monitor: server2]; disable-primary-on-quorum-timeout-after [60]; primary-write-lease-duration [73]; monitor-interval-timeout [60]
May 25 13:51:20  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:20 From RetryExecuteWithTimeout - Attempt 1 to connect to the instance at localhost:1433
May 25 13:51:20  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:20 Connected to the instance at localhost:1433
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Monitor Caller is: monitor.
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 [DEBUG] AG Helper Monitor Role info: AVAILABILITY GROUP ag1 on instance server2
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Replica is PRIMARY (1)
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Offlining replica...
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Replica is RESOLVING (0)
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Instance name is server2.
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Current master is server3
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: server2.
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Setting the role to Secondary.
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Setting replica to SECONDARY role...
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: 2023/05/25 13:51:25 Could not set replica to SECONDARY role. Failover Failed.
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: monitor: PROMOTION_SCORE: -INFINITY
May 25 13:51:25  ag(ag_cluster)[458795]:    INFO: lease_expiry after monitor update:

Comment 4 Aravind Mahadevan 2023-07-17 05:29:29 UTC

Created attachment 1976119 [details]
Pacemaker logs

Comment 5 Aravind Mahadevan 2023-07-17 05:30:22 UTC

Created attachment 1976120 [details]
Pacemaker logs

Comment 6 Aravind Mahadevan 2023-07-17 15:58:04 UTC

More details/explanation for the logs attached : 

Timestamp 1: Before shutting down sql server on primary node (node 1).
       Pacemaker1.log is the primary.
       Pacemaker2 and pacemaker3 are the secondary.
Timestamp 2: after shutting down sql server on primary node (node 1).
       Pacemaker1 is in a unavailable status.
       Pacemaker2 is the new primary.
       Pacemaker3 is the secondary.

Comment 7 Ken Gaillot 2023-07-17 21:08:29 UTC

Hi,

Those look like logs from the agent only. Do you have /var/log/pacemaker/pacemaker.log from all nodes covering the same time? If logging to that file is disabled, any lines from the system log (/var/log/syslog or systemd journal) containing "pacemaker" should help.

Comment 8 Aravind Mahadevan 2023-07-20 09:06:25 UTC

These logs are from /var/log/pacemaker. attaching them for your reference.

We cannot find a file or directory named /var/log/syslog in RHEL and systemd is not a built-in command in RHEL.

In pacemaker2_7_19.log, Please take a look at
Jul 19 14:41:37  ag(ag_cluster)[44254]:    INFO: monitor: 2023/07/19 14:41:37 Current master is server1
Jul 19 14:41:37  ag(ag_cluster)[44254]:    INFO: monitor: server2.

Comment 9 Aravind Mahadevan 2023-07-20 09:07:07 UTC

Created attachment 1976684 [details]
pacemaker logs_updated

Comment 10 Aravind Mahadevan 2023-07-20 09:09:02 UTC

Created attachment 1976685 [details]
pacemaker logs_updated

Comment 11 Aravind Mahadevan 2023-07-20 09:09:20 UTC

Created attachment 1976686 [details]
pacemaker logs_updated

Comment 12 Aravind Mahadevan 2023-07-20 20:21:38 UTC

What we are observing is : “start-failure-is-fatal=true” does not work as expected /not being adhered to

Expected behavior is..  cluster will try to start resource on another node that has higer score.. but instead old primary resource itself is being promoted even after start failure..

So start-failure-is-fatal = true is not being adhered to for automatic failover when old primary sql server is shutdown

Comment 13 Ken Gaillot 2023-07-24 17:13:47 UTC

(In reply to Aravind Mahadevan from comment #8)
> These logs are from /var/log/pacemaker. attaching them for your reference.
> 
> We cannot find a file or directory named /var/log/syslog in RHEL and systemd
> is not a built-in command in RHEL.
> 
> In pacemaker2_7_19.log, Please take a look at
> Jul 19 14:41:37  ag(ag_cluster)[44254]:    INFO: monitor: 2023/07/19
> 14:41:37 Current master is server1
> Jul 19 14:41:37  ag(ag_cluster)[44254]:    INFO: monitor: server2.

Apologies, I meant /var/log/messages. For the systemd journal, the command is like:

    journalctl --since "YYYY-MM-DD HH:MM:SS" --until "YYYY-MM-DD HH:MM:SS"

replacing the date/times appropriately. Either /var/log/messages or that output would be sufficient.

Comment 14 Ken Gaillot 2023-07-24 17:35:08 UTC

(In reply to Aravind Mahadevan from comment #12)
> What we are observing is : “start-failure-is-fatal=true” does not work as
> expected /not being adhered to
> 
> Expected behavior is..  cluster will try to start resource on another node
> that has higer score.. but instead old primary resource itself is being
> promoted even after start failure..
> 
> So start-failure-is-fatal = true is not being adhered to for automatic
> failover when old primary sql server is shutdown

Hi Aravind,

I don't need logs anymore, I believe I understand what's happening.

on-fail="demote" can be set for promote actions and for monitor actions for the promoted role, so one of those is what is actually failing. Start is technically a separate action (a "promote" is done after the start completes), so start-failure-is-fatal is not used.

When the action fails, the resource is demoted, then the cluster recalculates whether and where a new instance should be promoted. The node with the failure *is* eligible, so if promotion scores have not changed, it will be promoted again. (This is similar to a regular monitor failing, and recovery being attempted with a restart on the same node.)

There is no equivalent of start-failure-is-fatal or migration-threshold for promotion failures, but the same effect can be achieved with rule-based location constraints. For example:

    pcs constraint location <resource> rule role="Promoted" score="-INFINITY" \
        "fail-count-<resource>#promote_0" gt 0 \
        or "fail-count-<resource>#monitor_<interval>" gt 0

replacing <resource> with the resource name and <interval> with the configured monitor interval in milliseconds (so, if you have interval="10", use 10000). That tells the cluster to disallow the resource for promotion on any node where promotion has previously failed. If you want the node to still be eligible for promotion but just at a lower preference, you could use a finite number instead of -INFINITY. Cleaning up the fail count will make the node eligible again.

Let me know if that works for you.

Comment 15 Aravind Mahadevan 2023-07-31 05:17:01 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=2224249 and https://bugzilla.redhat.com/show_bug.cgi?id=2221772 are handled by @

Comment 16 Aravind Mahadevan 2023-07-31 05:18:53 UTC

@kgaillot , I'd request a call to be scheduled between yourself, me and our engineering dev Yunxi Jia as there's some confusion and disagreements about this ticket issue https://bugzilla.redhat.com/show_bug.cgi?id=2221772 as well as https://bugzilla.redhat.com/show_bug.cgi?id=2224249 . Since both are handled by you, Kindly request you to let us know if we could have a call today or tomorrow at the earliest and share your preferable timeslots/working timezone so that we could schedule it accordingly. If not a call, then it would delay things and we wouldn't be able to finish our HA testing for our upcoming SQL on Linux CU release.

Comment 17 Ken Gaillot 2023-07-31 14:24:08 UTC

(In reply to Aravind Mahadevan from comment #16)
> @kgaillot , I'd request a call to be scheduled between yourself,
> me and our engineering dev Yunxi Jia as there's some confusion and
> disagreements about this ticket issue
> https://bugzilla.redhat.com/show_bug.cgi?id=2221772 as well as
> https://bugzilla.redhat.com/show_bug.cgi?id=2224249 . Since both are handled
> by you, Kindly request you to let us know if we could have a call today or
> tomorrow at the earliest and share your preferable timeslots/working
> timezone so that we could schedule it accordingly. If not a call, then it
> would delay things and we wouldn't be able to finish our HA testing for our
> upcoming SQL on Linux CU release.

Certainly. I am available the rest of today (until around 6PM US Central / 11PM UTC) and tomorrow morning (around 9AM to noon US Central / 2PM to 5PM UTC).

Comment 18 Ken Gaillot 2023-07-31 18:09:00 UTC

(In reply to Ken Gaillot from comment #17)
> (In reply to Aravind Mahadevan from comment #16)
> > @kgaillot , I'd request a call to be scheduled between yourself,
> > me and our engineering dev Yunxi Jia as there's some confusion and
> > disagreements about this ticket issue
> > https://bugzilla.redhat.com/show_bug.cgi?id=2221772 as well as
> > https://bugzilla.redhat.com/show_bug.cgi?id=2224249 . Since both are handled
> > by you, Kindly request you to let us know if we could have a call today or
> > tomorrow at the earliest and share your preferable timeslots/working
> > timezone so that we could schedule it accordingly. If not a call, then it
> > would delay things and we wouldn't be able to finish our HA testing for our
> > upcoming SQL on Linux CU release.
> 
> Certainly. I am available the rest of today (until around 6PM US Central /
> 11PM UTC) and tomorrow morning (around 9AM to noon US Central / 2PM to 5PM
> UTC).

Hi Aravind,

Not sure if my attempt to comment via Google calendar went anywhere :)

I have to leave early tomorrow, so the meeting would have to start no later than noon US Central time (5PM UTC)

Comment 19 Aravind Mahadevan 2023-07-31 18:34:59 UTC

8:00am PST works for you ? @

Comment 20 Aravind Mahadevan 2023-07-31 18:36:14 UTC

10am CST tomorrow ( 8am PST ) works for you hopefully ? Sent an updated invite @kgaillot

Comment 21 Ken Gaillot 2023-07-31 19:05:42 UTC

(In reply to Aravind Mahadevan from comment #20)
> 10am CST tomorrow ( 8am PST ) works for you hopefully ? Sent an updated
> invite @kgaillot

Yes, see you then

Comment 22 Aravind Mahadevan 2023-08-01 16:09:31 UTC

Agent code : https://github.com/microsoft/mssql-server-ha/blob/master/ag/ag

Comment 23 Ken Gaillot 2023-08-01 17:18:54 UTC

(In reply to Aravind Mahadevan from comment #22)
> Agent code : https://github.com/microsoft/mssql-server-ha/blob/master/ag/ag

As discussed, the most important change needed is that the regular expression when setting current_master at lines 94 and 301 should use (Master|Promoted) instead of Master. That will handle the RHEL 9 output as well as earlier versions.

I'm out of time today, so I'll investigate further tomorrow.

FYI, agent tracing can be a big help during testing. You can turn it on with "pcs resource update <resource> trace_ra=1". It will log every single line that the agent executes, so the logs grow really quickly, so I would only use it briefly. The output will be in the /var/lib/heartbeat/trace_ra/ag directory.

Comment 25 Aravind Mahadevan 2023-08-17 16:56:02 UTC

I'll ask our team to try agent tracing while testing once.
@kgaillot : Any further insights from your investigation ? Please do let us know, your inputs are very much appreciated !

Comment 26 Ken Gaillot 2023-08-28 17:34:17 UTC

(In reply to Aravind Mahadevan from comment #25)
> I'll ask our team to try agent tracing while testing once.
> @kgaillot : Any further insights from your investigation ? Please
> do let us know, your inputs are very much appreciated !

The agent looks good other than the Master|Promoted change (which would only matter for RHEL 9)

The constraint in Comment 14 should do what you want for recovery (don't retry promotion on node that was demoted due to on-fail=demote)

Comment 27 RHEL Program Management 2023-09-22 20:24:15 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 28 RHEL Program Management 2023-09-22 20:27:00 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.