Bug 1384109

Summary: It takes about 1 second to get stonith agent metadata using crm_resource
Product: Red Hat Enterprise Linux 7 Reporter: Tomas Jelinek <tojeline>
Component: pacemakerAssignee: Jan Pokorný [poki] <jpokorny>
Status: CLOSED WONTFIX QA Contact: cluster-qe <cluster-qe>
Severity: low Docs Contact:
Priority: high    
Version: 7.3CC: cluster-maint, jpokorny, kgaillot, mnovacek, phagara
Target Milestone: rc   
Target Release: 7.9   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
: 1552654 (view as bug list) Environment:
Last Closed: 2020-02-21 16:56:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1552654    

Description Tomas Jelinek 2016-10-12 14:56:07 UTC
Description of problem:
It takes about 1 second to get stonith agent metadata using crm_resource --show-metadata.

Recently (bz1262001) pcs switched to reading all information about resource and stonith agents from pacemaker. When listing all fence agents using pcs, it takes quite a long time for the command to finish if more fence agents are installed.

Also whenever pcs neesd to work with metadata, there is a noticeable slow down comparing to previous version.

Since pacemaker changes bits and pieces in metadata we prefer pcs not to get metadata directly from agents.


Version-Release number of selected component (if applicable):
[root@rh68-node1:~]# rpm -q pacemaker
pacemaker-1.1.15-1.el6.x86_64


How reproducible:
always, easily


Steps to Reproduce:
[root@rh68-node1:~]# time fence_apc -o metadata > /dev/null

real    0m0.053s
user    0m0.037s
sys     0m0.012s
[root@rh68-node1:~]# time crm_resource --show-metadata stonith:fence_apc > /dev/null

real    0m1.017s
user    0m0.070s
sys     0m0.007s

# listing using pcs, one agent per line
[root@rh68-node1:~]# time pcs stonith list | wc
     44     337    2087
                                                                                                                    
real    0m44.872s
user    0m2.318s
sys     0m0.437s


Actual results:
It takes about 1 second to get metadata.


Expected results:
It should take about the same time as getting it from the agent directly.

Comment 2 Ken Gaillot 2016-10-12 21:45:21 UTC
This does seem odd.

Reassigning to RHEL7, as RHEL6 is only getting high-priority bugfixes now, and the behavior is present on RHEL7.

Comment 3 Jan Pokorný [poki] 2017-01-26 19:39:16 UTC
If I am not mistaken, part of the issue may be that there are two
roundtrips hidden in the stonith query as opposed to the resource
one:

lrmd API client             lrmd API client
(crm_resource)              (crm_resource)
                            
  |        ^                  |        ^
  |        |                  |        |
  v        |                  v        |
                            
lrmd API handler            lrmd API handler
    (lrmd)                      (lrmd)
                   
  |        ^       
  |        |       
  v        |       
                   
 stonith-ng API
    handler
  (stonithd)

Comment 4 Jan Pokorný [poki] 2017-01-27 13:38:53 UTC
Sorry, there's in fact no message routing round trip at all in the
context of pcs' use of crm_resource.

Results in RHEL 7.3 VM:

- "/usr/sbin/fence_apc -o metadata" takes around 0.084s

- "crm_resource --show-metadata stonith:fence_apc" around 1.022s

Using strace with timeouts, I can notice that there is a significant
pause (750-800 ms) after forked process to exec fence_apc has exited
and before WNOHANG wait resumes.

Comment 5 Jan Pokorný [poki] 2017-01-27 15:42:36 UTC
Looks like commit 12cf7b901733a96e4a7844e9f596430c5e8c2a3c introduced
unnecessary block-for-a-second penalty.

Proposed and tested fix (boost by factor of 10):
https://github.com/ClusterLabs/pacemaker/pull/1214

Unfortunately, it currently fails at an lrmd test
(more investigation pending).

Comment 6 Ken Gaillot 2017-05-10 14:50:03 UTC
This will be not be ready for 7.4, bumping to 7.5

Comment 7 Ken Gaillot 2017-10-18 22:33:50 UTC
This will not make it in time for 7.5

Comment 9 Ken Gaillot 2020-02-21 16:56:40 UTC
Due to developer time constraints, this is unlikely to be done in the 7.9 time frame and so will be fixed for RHEL 8 only (Bug 1552654)