Bug 1305913

Summary: pcs cluster setup may get stuck due to pipe issue
Product: Red Hat Enterprise Linux 6 Reporter: Radek Steiger <rsteiger>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 6.8CC: cfeist, cluster-maint, idevat, omular, tojeline
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.9.148-4.el6 Doc Type: Bug Fix
Doc Text:
Cause: User runs 'pcs cluster setup' command. Consequence: Pcs gets stuck occasionally. Fix: Close file descriptors when running external processes from pcs. Result: Pcs does not get stuck.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-10 19:27:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Radek Steiger 2016-02-09 14:59:05 UTC
> Description of problem:

Running cluster setup may soemtimes get stuck with pcsd-cli.rb subprocesses never returning. It looks like the pipe between python caller and ruby subprocesses when calling for read_tokens.

This can be best reproduced by running the cluster setup in a loop with 10 seconds delay between the attempts (timing seems to be a key here). It might take several minutes before the setup will get stuck.


> Version-Release number of selected component (if applicable):

[root@virt-031 ~]# rpm -q pcs
pcs-0.9.148-3.el6.x86_64


> How reproducible:

~ 10%


> Steps to Reproduce:
1. while [ $? -eq 0 ]; do sleep 10; date; pcs cluster setup --name STSRHTS24987 virt-030 virt-031 virt-033 --force --debug; done
2. wait until the loop above hangs


> Actual results:

Debug output get stuck here:

<...snip...>
Destroying cluster on nodes: virt-030, virt-031, virt-033...
Running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb read_tokens
--Debug Input Start--
{}
--Debug Input End--
Running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb read_tokens
--Debug Input Start--
{}
--Debug Input End--
Running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb read_tokens
--Debug Input Start--
{}
--Debug Input End--

processes:

root	  6808  4.8  1.7 455108 18080 pts/3    Sl+  15:57   0:00          \_ /usr/bin/python /usr/sbin/pcs cluster setup --name STSRHTS24987 virt-030 virt-031 virt-033 --force --debug
root	  6983 11.0  0.6  41772  6756 pts/3    R+   15:57   0:00              \_ /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb read_tokens
root	  6985 11.0  0.6  41776  6756 pts/3    R+   15:57   0:00              \_ /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb read_tokens
root	  6986 11.0  0.6  41768  6760 pts/3    R+   15:57   0:00              \_ /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb read_tokens


> Expected results:

Cluster setup finishes in all cases.

Comment 5 Tomas Jelinek 2016-02-15 12:47:45 UTC
patch in upstream: https://github.com/feist/pcs/commit/59dde9ba191bc079aee08ac25576b51b9a85681c

test in bug description

Comment 6 Ivan Devat 2016-02-17 12:29:37 UTC
The problem occurs only occasionally, it is not easy to reproduce. Tested only successful cluster setup after fix.

Comment 10 errata-xmlrpc 2016-05-10 19:27:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0739.html