Bug 1292858

Summary: pcs should timeout during network requests
Product: Red Hat Enterprise Linux 7 Reporter: Chris Feist <cfeist>
Component: pcsAssignee: Ondrej Mular <omular>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.3CC: cfeist, cluster-maint, idevat, mlisik, rsteiger, tojeline, vlad.socaciu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.9.156-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 18:22:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1334429, 1395959    

Description Chris Feist 2015-12-18 14:51:59 UTC
Description of problem:
If pcs connects to a remote node, but it hangs or doesn't respond, then 'pcs status' will hang.  We should probably have some sane timeouts (and error messages) if this happens.

This was original discovered with an MTU mismatch, so the opening TCP connection to pcs succeeded, but no other packets could get through.

So when running 'pcs status' the status just hung where pcsd was being queried.

Comment 5 Ondrej Mular 2017-02-03 16:19:11 UTC
upstream patches:
https://github.com/ClusterLabs/pcs/commit/076b8b6ea473835810596f967bef41d7cf1f
https://github.com/ClusterLabs/pcs/commit/731127b8cfffd29c8546bd4a8a461f7aade5

New parameter --request-timeout has been added to pcs.

TEST:
2 node cluster: rhel7-node1 rhel7-node2
Block port 2224 (pcsd) on rhel7-node2
[root@rhel7-node2 ~]# iptables -I OUTPUT -p tcp --dport 2224 -j DROP
[root@rhel7-node2 ~]# iptables -I INPUT -p tcp --dport 2224 -j DROP

Then try to run some commands on rhel7-node1 (they should timed out instead of hang):
[root@rhel7-node1 ~]# pcs cluster auth rhel7-node2 -uhacluster --request-timeout=3
Password: 
Error: Operation timed out
Error: Unable to communicate with rhel7-node2

[root@rhel7-node1 ~]# pcs stonith sbd status --request-timeout=3
Warning: rhel7-node2: Connection timeout (Connection timed out after 3001 milliseconds)
Warning: Unable to get status of SBD from node 'rhel7-node2'
SBD STATUS
<node name>: <installed> | <enabled> | <running>
rhel7-node1:  NO |  NO |  NO
rhel7-node2: N/A | N/A | N/A

Comment 7 Ivan Devat 2017-02-20 08:19:28 UTC
After Fix:

[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.156-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs stonith sbd status --request-timeout=3
Warning: vm-rhel72-3: Connection timeout (Connection timed out after 3001 milliseconds)
Warning: Unable to get status of SBD from node 'vm-rhel72-3'
SBD STATUS
<node name>: <installed> | <enabled> | <running>
vm-rhel72-1: YES |  NO |  NO
vm-rhel72-3: N/A | N/A | N/A

Comment 9 Tomas Jelinek 2017-02-20 13:53:54 UTC
*** Bug 1395959 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2017-08-01 18:22:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1958