Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1057697

Summary: pacemaker crmd component can infinitely block for systemd service metadata retrieval
Product: Red Hat Enterprise Linux 7 Reporter: David Vossel <dvossel>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0CC: cluster-maint, dvossel, fdinitto, jkortus
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.10-23.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 11:52:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
crm report with non-blocking patch
none
crm report WITHOUT non blocking patch none

Description David Vossel 2014-01-24 15:53:13 UTC
Description of problem:

The crmd component can infinitely block while retrieving systemd service metadata. This is because we use a blocking socket to monitor a child processes after executing a fork().

There is a patch for this upstream that solves the blocking problem.

https://github.com/ClusterLabs/pacemaker/commit/b0ab1ccdb55dbead40fae097e4f84e445878afb1

It is not completely understood why the blocking occurs though. There is likely  work to be done in service libraries use of systemd.  We also need to discuss how to recover the crmd when something like this occurs.

Version-Release number of selected component (if applicable):

pacemaker-1.1.10-22.el7

How reproducible:

100% reproducible when managing samba as a HA resource with the smb.service file.

Steps to Reproduce:
1. pcs resource create samba systemd:smb meta target-role=Stopped


Actual results:

crmd for every node is the cluster is frozen while blocking on a read operation because of metadata actions surrounding the addition of the smb.service resource into the cluster.  The smb service doesn't even have to be enabled for this to  occur.

Expected results:

Crmd doesn't infinitely block while retrieving smb.service metadata.

Additional info:

This very bad. If this occurs, every node's crmd component in the cluster freezes at the same time. There is no recovering from this.  No fencing actions take place because the crmd isn't processing any errors or timeouts.  The cib processes continue to work, making it appear that services are still manageable, but in reality the cluster is completely hosed. No policy engine transitions are possible.

Comment 1 David Vossel 2014-01-24 15:53:48 UTC
backtrace of blocking 

#0  0x00007fb5ca056aad in read () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fb5cbfcaba2 in services_os_action_execute (op=op@entry=0x15b64c0, synchronous=synchronous@entry=1) at services_linux.c:546
        fdsi = {ssi_signo = 3226078000, ssi_errno = 32767, ssi_code = -1068889312, ssi_pid = 32767, ssi_uid = 1179966265, ssi_fd = 0, ssi_tid = 3422311072, ssi_band = 32693, ssi_overrun = 4294967295, 
          ssi_trapno = 0, ssi_status = 0, ssi_int = 0, ssi_ptr = 140418788118048, ssi_utime = 140418805650656, ssi_stime = 0, ssi_addr = 532575944823, 
          __pad = "o\vJ\300\377\177\000\000w\000\000\000|\000\000\000\346\347\374หต\177\000\000\320\033[\001\000\000\000\000p\232]\001\000\000\000\000\002\000\000\000\000\000\000"}
        s = <optimized out>
        poll_rc = <optimized out>
        status = 0
        timeout = 30000
        sfd = 18
        fds = {{fd = 17, events = 1, revents = 16}, {fd = 19, events = 1, revents = 16}, {fd = 18, events = 1, revents = 1}}
        wait_rc = 0
        start = 1390573388
        lpc = <optimized out>
        stdout_fd = {17, 18}
        stderr_fd = {19, 20}
        mask = {__val = {65536, 0 <repeats 15 times>}}
        old_mask = {__val = {0 <repeats 16 times>}}
        st = {st_dev = 64771, st_ino = 8930030, st_nlink = 1, st_mode = 33261, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 5115, st_blksize = 4096, st_blocks = 16, st_atim = {
            tv_sec = 1390573388, tv_nsec = 633000000}, st_mtim = {tv_sec = 1390260345, tv_nsec = 0}, st_ctim = {tv_sec = 1390571498, tv_nsec = 779000000}, __unused = {0, 0, 0}}
        __func__ = "services_os_action_execute"
#2  0x00007fb5cbfc9056 in services_action_sync (op=op@entry=0x15b64c0) at services.c:437
        rc = 1
        __func__ = "services_action_sync"
#3  0x00007fb5cbdbcf69 in generic_get_metadata (output=0x7fffc04a0d60, type=0x15d9a70 "Dummy", provider=0x15b1bd0 "heartbeat", standard=0x15b1bb0 "ocf") at lrmd_client.c:1751
        action = 0x15b64c0
#4  lrmd_api_get_metadata (lrmd=<optimized out>, class=0x15b1bb0 "ocf", provider=0x15b1bd0 "heartbeat", type=0x15d9a70 "Dummy", output=0x7fffc04a0d60, options=<optimized out>) at lrmd_client.c:1792
No locals.
#5  0x000000000041e477 in get_rsc_metadata (provider=0x15b1bd0 "heartbeat", class=<optimized out>, type=0x15d9a70 "Dummy") at lrm.c:441
        lrm_state = <optimized out>
        rc = 0
        metadata = 0x0

Comment 2 David Vossel 2014-01-24 15:56:14 UTC
I just noticed in the backtrace above that the blocking is occurring on an OCF agent. Something about a failure earlier with a systemd resource triggers this. I'll create a crm_report.

Comment 3 David Vossel 2014-01-24 15:57:44 UTC
*** Bug 1057391 has been marked as a duplicate of this bug. ***

Comment 5 David Vossel 2014-01-24 16:20:45 UTC
Created attachment 855066 [details]
crm report with non-blocking patch

Comment 6 David Vossel 2014-01-24 18:02:28 UTC
Created attachment 855127 [details]
crm report WITHOUT non blocking patch

Comment 7 David Vossel 2014-01-24 18:18:30 UTC

With the NON_BLOCKING patch, instead of blocking we timeout during the metadata retrieval. There's a read error about the signal fd being 'temporarily unavailable'

cat rhel7-node1/messages  | grep -e "temp" -e "timed out"
Jan 24 10:02:02 rhel7-node1 crmd[1752]: warning: services_os_action_execute: Filesystem_meta-data_0:1896 - timed out after 30000ms
Jan 24 10:02:32 rhel7-node1 crmd[1752]: warning: services_os_action_execute: IPaddr2_meta-data_0:1905 - timed out after 30000ms
Jan 24 10:02:32 rhel7-node1 crmd[1752]: error: services_os_action_execute: Read from signal fd 16 failed: Resource temporarily unavailable (11)
Jan 24 10:03:02 rhel7-node1 crmd[1752]: warning: services_os_action_execute: Dummy_meta-data_0:1926 - timed out after 30000ms
Jan 24 10:03:02 rhel7-node1 crmd[1752]: error: services_os_action_execute: Read from signal fd 16 failed: Resource temporarily unavailable (11)
Jan 24 10:03:32 rhel7-node1 crmd[1752]: warning: services_os_action_execute: Dummy_meta-data_0:2013 - timed out after 30000ms

With OUT the non-blocking patch, we can block on read indefinitely while trying to retrieve metadata.  We can see in the logs where we get stuck.

cat rhel7-node1/pacemaker.log | grep crmd:
<snip>
crmd: (       lrm.c:440   )   trace: get_rsc_metadata: 	Retreiving metadata for controld::ocf:pacemaker

attaching to gdb on the blocking crmd process shows this.

#0  0x00007fb811db7aad in read () from /lib64/libc.so.6
#1  0x00007fb813d2bc5a in read (__nbytes=128, __buf=0x7fff4a95ba00, __fd=17) at /usr/include/bits/unistd.h:44
#2  services_os_action_execute (op=op@entry=0x11885f0, synchronous=synchronous@entry=1) at services_linux.c:546
#3  0x00007fb813d2a0f6 in services_action_sync (op=op@entry=0x11885f0) at services.c:437
#4  0x00007fb813b1e639 in generic_get_metadata (output=0x7fff4a95bc60, type=0x121d890 "controld", provider=0x121d8d0 "pacemaker", standard=0x121d8b0 "ocf") at lrmd_client.c:1751
#5  lrmd_api_get_metadata (lrmd=<optimized out>, class=0x121d8b0 "ocf", provider=0x121d8d0 "pacemaker", type=0x121d890 "controld", output=0x7fff4a95bc60, 
    options=<optimized out>) at lrmd_client.c:1792
#6  0x000000000041e52a in get_rsc_metadata (provider=0x121d8d0 "pacemaker", class=<optimized out>, type=0x121d890 "controld") at lrm.c:441
#7  get_rsc_restart_list (op=<optimized out>, rsc=0x1185710) at lrm.c:526
#8  append_restart_list (version=0x11792d0 "3.0.8", update=0x1188570, op=0x7fff4a95bd60, rsc=0x1185710) at lrm.c:612
#9  build_operation_update (parent=parent@entry=0x1216fd0, rsc=rsc@entry=0x1185710, op=op@entry=0x7fff4a95bd60, src=src@entry=0x43d970 <__FUNCTION__.23368> "do_update_resource")
    at lrm.c:677
#10 0x0000000000421461 in do_update_resource (lrm_state=0x112ea40, op=0x7fff4a95bd60, rsc=0x1185710) at lrm.c:1938
#11 process_lrm_event (lrm_state=0x112ea40, op=0x7fff4a95bd60) at lrm.c:2062
#12 0x00007fb813b1cdff in lrmd_dispatch_internal (lrmd=lrmd@entry=0x112fe10, msg=msg@entry=0x1176ed0) at lrmd_client.c:291
#13 0x00007fb813b1d169 in lrmd_ipc_dispatch (buffer=<optimized out>, length=<optimized out>, userdata=0x112fe10) at lrmd_client.c:313
#14 0x00007fb813f5f765 in mainloop_gio_callback (gio=<optimized out>, condition=G_IO_IN, data=0x112ffe0) at mainloop.c:639
#15 0x00007fb8115dbaf6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#16 0x00007fb8115dbe78 in g_main_context_iterate.isra.22 () from /lib64/libglib-2.0.so.0
#17 0x00007fb8115dc28a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#18 0x000000000040627e in crmd_init () at main.c:154
#19 0x000000000040604e in main (argc=1, argv=0x7fff4a95c138) at main.c:121


The NON_BLOCKING patch is a definite improvement, but I don't understand what is causing the the timeouts to occur even with the non-blocking patch. It is like we lose the ability to poll the child process signal fd. "error: services_os_action_execute: Read from signal fd 16 failed: Resource temporarily unavailable (11)" 

-- Vossel

Comment 8 Andrew Beekhof 2014-02-12 11:50:41 UTC
David, did you have an easy reproducer for this?

Comment 9 Andrew Beekhof 2014-02-12 11:57:31 UTC
Summarising an upstream discussion:

The timeouts are ocuring due to the interaction of pacemaker signal handling code with the unexpected use of threads by the glib dbus code.

The solution has been to remove all use of glib dbus code and instead use the libdbus API directly.

This is implemented in the following cherry-picked patches:

+ David Vossel (7 days ago) 719498e: Low: services: Keep dbus build support optional  (HEAD, origin/rhel-7.0, rhel-7.0)
+ Andrew Beekhof (8 days ago) 04d86af: Refactor: dbus: Use native function for adding arguments to messages 
+ Andrew Beekhof (9 days ago) c124464: Fix: Bug rhbz#1057697 - Use native DBus library for upstart support to avoid problematic use of threads 

^^^ All upstart related code has been dropped from the cherry picked patch.
There were some minor changes that were needed however

+ Andrew Beekhof (9 days ago) 6080e33: Fix: Portability: Use basic types for DBus compatability struct 
+ Andrew Beekhof (9 days ago) 5be9e5b: Refactor: systemd: Simplify dbus API usage 
+ Andrew Beekhof (13 days ago) c7ca5ec: Fix: Bug rhbz#1057697 - Use native DBus library for systemd async support to avoid problematic use of threads 
+ Andrew Beekhof (2 weeks ago) 6b3f1e2: Fix: Bug rhbz#1057697 - Use native DBus library for systemd support to avoid problematic use of threads 
+ Andrew Beekhof (6 months ago) 385cb2b: Feature: services: Detect missing agents and permission errors before forking

Comment 10 Jaroslav Kortus 2014-02-12 12:40:16 UTC
If I get it correctly we should not be seeing behaviour like in comment 0 or in comment 7 with the patches.

Is it samba specific or can we reproduce it with other agents too? Would unit test be feasible to add or is this just a thread race as suggested by Andrew?

Comment 11 David Vossel 2014-02-12 15:57:56 UTC
(In reply to Jaroslav Kortus from comment #10)
> If I get it correctly we should not be seeing behaviour like in comment 0 or
> in comment 7 with the patches.
> 
> Is it samba specific or can we reproduce it with other agents too? Would

Any systemd agent will introduce this bug.  The bug is in pacemaker's systemd management.

> unit test be feasible to add or is this just a thread race as suggested by
> Andrew?

This is definitely a thread race condition.  If there was a unit test, it would be to guarantee none of pacemaker's components ever use a thread (which they don't now).  It was not understood that the glib2 dbus api would launch a new thread. This was unintended.

Comment 12 David Vossel 2014-02-12 16:03:59 UTC
(In reply to Andrew Beekhof from comment #8)
> David, did you have an easy reproducer for this?

no, it is timing related. I hit it often with my samba configuration, but I can't guarantee it will happen in another environment.

Comment 16 Ludek Smid 2014-06-13 11:52:46 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.