Hide Forgot
Description of problem: It takes about 1 second to get stonith agent metadata using crm_resource --show-metadata. Recently (bz1262001) pcs switched to reading all information about resource and stonith agents from pacemaker. When listing all fence agents using pcs, it takes quite a long time for the command to finish if more fence agents are installed. Also whenever pcs neesd to work with metadata, there is a noticeable slow down comparing to previous version. Since pacemaker changes bits and pieces in metadata we prefer pcs not to get metadata directly from agents. Version-Release number of selected component (if applicable): [root@rh68-node1:~]# rpm -q pacemaker pacemaker-1.1.15-1.el6.x86_64 How reproducible: always, easily Steps to Reproduce: [root@rh68-node1:~]# time fence_apc -o metadata > /dev/null real 0m0.053s user 0m0.037s sys 0m0.012s [root@rh68-node1:~]# time crm_resource --show-metadata stonith:fence_apc > /dev/null real 0m1.017s user 0m0.070s sys 0m0.007s # listing using pcs, one agent per line [root@rh68-node1:~]# time pcs stonith list | wc 44 337 2087 real 0m44.872s user 0m2.318s sys 0m0.437s Actual results: It takes about 1 second to get metadata. Expected results: It should take about the same time as getting it from the agent directly.
This does seem odd. Reassigning to RHEL7, as RHEL6 is only getting high-priority bugfixes now, and the behavior is present on RHEL7.
If I am not mistaken, part of the issue may be that there are two roundtrips hidden in the stonith query as opposed to the resource one: lrmd API client lrmd API client (crm_resource) (crm_resource) | ^ | ^ | | | | v | v | lrmd API handler lrmd API handler (lrmd) (lrmd) | ^ | | v | stonith-ng API handler (stonithd)
Sorry, there's in fact no message routing round trip at all in the context of pcs' use of crm_resource. Results in RHEL 7.3 VM: - "/usr/sbin/fence_apc -o metadata" takes around 0.084s - "crm_resource --show-metadata stonith:fence_apc" around 1.022s Using strace with timeouts, I can notice that there is a significant pause (750-800 ms) after forked process to exec fence_apc has exited and before WNOHANG wait resumes.
Looks like commit 12cf7b901733a96e4a7844e9f596430c5e8c2a3c introduced unnecessary block-for-a-second penalty. Proposed and tested fix (boost by factor of 10): https://github.com/ClusterLabs/pacemaker/pull/1214 Unfortunately, it currently fails at an lrmd test (more investigation pending).
This will be not be ready for 7.4, bumping to 7.5
This will not make it in time for 7.5
Due to developer time constraints, this is unlikely to be done in the 7.9 time frame and so will be fixed for RHEL 8 only (Bug 1552654)