RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2111711 - pacemaker command "crm_attribute" intermittently fails with error code 102
Summary: pacemaker command "crm_attribute" intermittently fails with error code 102
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: libqb
Version: 8.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 2149647 2151300 2151301 2151302
TreeView+ depends on / blocked
 
Reported: 2022-07-28 00:58 UTC by Joshua Baker
Modified: 2023-05-16 11:17 UTC (History)
9 users (show)

Fixed In Version: libqb-1.0.3-13.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2149647 2151300 2151301 2151302 (view as bug list)
Environment:
Last Closed: 2023-05-16 09:10:22 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CLUSTERQE-6252 0 None None None 2022-12-01 17:12:18 UTC
Red Hat Issue Tracker KCSOPP-2034 0 None None None 2022-07-28 22:36:05 UTC
Red Hat Issue Tracker RHELPLAN-129336 0 None None None 2022-07-28 01:08:19 UTC
Red Hat Knowledge Base (Solution) 6972496 0 None None None 2022-08-18 14:06:13 UTC
Red Hat Product Errata RHBA-2023:3012 0 None None None 2023-05-16 09:10:37 UTC

Description Joshua Baker 2022-07-28 00:58:40 UTC
Description of problem:
The "crm_attribute" commands intermittently exits with error code 102. This leads to no returned value from query commands ( -G ) and subsequent errors and unexpected failovers in the SAPHana resource agent, which runs this command during monitor operations. 

Below messages are commonly observed in pacemaker logs ( w/ debugging enabled ) when the error is observed. I am unable to find the source of the errors though:

Jul 22 04:37:52 node2 pacemaker-based     [3239965] (pcmk__new_client)  debug: New IPC client 6e179eb3-6ee0-43fa-bb4d-de1da5a9fc3d for PID 23
29429 with uid 0 and gid 0
Jul 22 04:37:52 node2 pacemaker-based     [3239965] (handle_new_connection)     debug: IPC credentials authenticated (/dev/shm/qb-3239965-232
9429-15-S3euL9/qb)
Jul 22 04:37:52 node2 pacemaker-based     [3239965] (qb_ipcs_shm_connect)       debug: connecting to client [2329429]
Jul 22 04:37:52 node2 pacemaker-based     [3239965] (qb_rb_open_2)      debug: shm size:524301; real_size:528384; rb->word_size:132096
Jul 22 04:37:52 node2 pacemaker-based     [3239965] (qb_rb_open_2)      debug: shm size:524301; real_size:528384; rb->word_size:132096
Jul 22 04:37:52 node2 pacemaker-based     [3239965] (qb_rb_open_2)      debug: shm size:524301; real_size:528384; rb->word_size:132096
Jul 22 04:37:52 node2 crm_attribute       [2329429] (crm_ipc_connect)   debug: Could not establish cib_rw connection: Resource temporarily un
available (11)
Jul 22 04:37:52 node2 crm_attribute       [2329429] (cib_native_signon_raw)     info: Could not connect to CIB manager for crm_attribute
Jul 22 04:37:52 node2 crm_attribute       [2329429] (cib_native_signon_raw)     info: Connection to CIB manager for crm_attribute failed: Tra
nsport endpoint is not connected
Jul 22 04:37:52 node2 crm_attribute       [2329429] (cib_native_signoff)        debug: Disconnecting from the CIB manager
Jul 22 04:37:52 node2 crm_attribute       [2329429] (crm_xml_cleanup)   info: Cleaning up memory from libxml2
Jul 22 04:37:52 node2 crm_attribute       [2329429] (crm_exit)  info: Exiting crm_attribute | with status 102
Jul 22 04:37:52 node2 pacemaker-based     [3239965] (handle_new_connection)     error: Error in connection setup (/dev/shm/qb-3239965-2329429-15-S3euL9/qb): Broken pipe (32)



Version-Release number of selected component (if applicable):

libqb-1.0.3-12.el8.x86_64
pacemaker-2.0.5-9.el8_4.3.x86_64 
pacemaker-cli-2.0.5-9.el8_4.3.x86_64 
resource-agents-sap-hana-0.154.0-2.el8_4.4.noarch


How reproducible:
Steps to reproduce this issue are not currently known.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Reid Wahl 2022-07-28 01:29:47 UTC
The issue is happening within libqb. I doubt it's a bug, but we're not sure what to look for in the customer's environment that might explain the intermittent ipc failures. We'd appreciate any insight including guesses. This is causing outages for a customer's SAP HANA cluster.

No red flags jump out at us. There is a backup software running (veeam). They have MDATP antivirus/security software installed, but they seem to be running only the MDATP audispd plugin; it doesn't look like the antivirus itself is running.

The pacemaker error from comment 0 comes from here, at this entry point to libqb.
```
bool
crm_ipc_connect(crm_ipc_t * client)
{
...
    client->need_reply = FALSE;
    client->ipc = qb_ipcc_connect(client->name, client->buf_size);

    if (client->ipc == NULL) {
        crm_debug("Could not establish %s connection: %s (%d)", client->name, pcmk_strerror(errno), errno);
        return FALSE;
    }
```

Comment 7 Joshua Baker 2022-07-28 23:48:59 UTC
@Reid and Ken

I requested that they disable MD ATP and EDR services on the servers. As you stated Reid, in the most recent sosreport from July 22nd only "mdatp_audisp_plugin" appeared active. Previous sosreports had a lot more running though, so I requested they work with MS to ensure that the services are disabled. Will see what they say tomorrow, but please let me know if we need to add any further request for CU.

Comment 8 Ken Gaillot 2022-08-01 15:34:02 UTC
(In reply to Joshua Baker from comment #7)
> @Reid and Ken
> 
> I requested that they disable MD ATP and EDR services on the servers. As you
> stated Reid, in the most recent sosreport from July 22nd only
> "mdatp_audisp_plugin" appeared active. Previous sosreports had a lot more
> running though, so I requested they work with MS to ensure that the services
> are disabled. Will see what they say tomorrow, but please let me know if we
> need to add any further request for CU.

Just to be clear, this would only be for diagnosing the issue -- we don't expect them to leave it off.

Comment 52 errata-xmlrpc 2023-05-16 09:10:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libqb bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3012


Note You need to log in before you can comment on or make changes to this bug.