1463033 – attrd_updater returns before the update is applied anywhere

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1463033 - attrd_updater returns before the update is applied anywhere

Summary: attrd_updater returns before the update is applied anywhere

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	8.0
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	pre-dev-freeze
Target Release:	8.8
Assignee:	Chris Lumens
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-20 00:52 UTC by Andrew Beekhof
Modified:	2023-05-16 09:52 UTC (History)
CC List:	4 users (show)
Fixed In Version:	pacemaker-2.1.5-4.el8
Doc Type:	Enhancement
Doc Text:	Feature: The attrd_updater command now accepts a --wait option that can take "no" (to preserve current behavior of returning immediately), "local" (to not return until a local query would return the new value), or "cluster" (to not return until querying any node would return the new value). Reason: Some use cases for node attributes have timing issues if they proceed before an attribute change is effective. Result: Users (including resource agents) can now choose an attribute synchronization point appropriate to their use case.
Clone Of:
Environment:
Last Closed:	2023-05-16 08:35:22 UTC
Type:	Enhancement
Target Upstream Version:	2.1.6
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Cluster Labs	1461	None	None	None	2018-04-16 22:52:06 UTC
Cluster Labs	5347	None	None	None	2020-12-01 16:56:55 UTC
Red Hat Issue Tracker	CLUSTERQE-6195	None	None	None	2022-11-22 14:01:39 UTC
Red Hat Product Errata	RHBA-2023:2818	None	None	None	2023-05-16 08:36:10 UTC

Description Andrew Beekhof 2017-06-20 00:52:54 UTC

Description of problem:

A timing window exists whereby attrd_updater -QA may fail to report values that were previously set by attrd_updater -U.

Version-Release number of selected component (if applicable):

all

How reproducible:

unclear, timing and hardware dependant.
reported by several customers

Steps to Reproduce:
1. Set a value with attrd_updater -U 
2. Immediately call attrd_updater -QA  for the same attribute
3.

Actual results:

Value not found

Expected results:

Value is always visible

Additional info:


The current sequence is:

t1. receive message over local IPC
t2. send local ACK
t3. forward message over CPG (to peers and itself)
t4. all nodes get the CPG message and apply the update

CPG will make sure that all active peers (including ourselves) receive the message (anyone that can't is evicted) and in the same order, however the timing would depend on your (corosync) token timeouts.

The minimum bar is to send the ack only after the update was applied, however while this would make the window much shorter, the window still exists since other nodes may not yet have applied it.   Specifically, querying a slow or comparatively more overloaded node could result in the same (bad) behaviour.

Which may push us towards adding some kind of internal ack phase.

Comment 2 Ken Gaillot 2017-08-01 16:23:31 UTC

Due to capacity constraints, this is unlikely to be addressed in the 7.5 timeframe

Comment 3 Ken Gaillot 2018-04-16 22:52:06 UTC

It might make a useful command-line option, e.g. --sync=(no|local|cluster).

A default of "no" would preserve current behavior, but affected RAs would all need to be modified to add --sync=cluster (but only when supported). A default of "cluster" would give the most correct behavior across all RAs automatically, but would lengthen response times, possibly surprising users with tight timeouts configured. Not sure which is better.

Comment 6 RHEL Program Management 2020-12-01 07:29:06 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 7 Ken Gaillot 2020-12-01 16:57:22 UTC

An equivalent request has been filed upstream

Comment 11 RHEL Program Management 2022-09-01 07:27:45 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 12 Ken Gaillot 2022-10-26 16:27:26 UTC

attrd_updater supports a --wait option that can be set to "no" (the default, which maintains the previous behavior) or "local" (to wait until a local query would return the value), as of commit 281e3756 in the upstream main branch

Support for "cluster" is in progress

Comment 14 Ken Gaillot 2022-11-03 22:47:56 UTC

Fixed as of commit 57e79ffb in the upstream main branch

Comment 20 Markéta Smazová 2023-01-09 20:25:14 UTC

verified as SanityOnly in pacemaker-2.1.5-4.el8

Comment 22 errata-xmlrpc 2023-05-16 08:35:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2818

Note You need to log in before you can comment on or make changes to this bug.