Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 968877

Summary: pcs+systemd: `pcs cluster auth` fails immediately after `systemctl start pcsd.service`, succeeds few seconds later
Product: Red Hat Enterprise Linux 7 Reporter: Marian Csontos <mcsontos>
Component: pcsAssignee: Chris Feist <cfeist>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: cluster-maint, rsteiger
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.9.49-3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1328870 (view as bug list) Environment:
Last Closed: 2014-06-13 10:22:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1328870, 1428392    

Description Marian Csontos 2013-05-30 07:40:24 UTC
Description of problem:
`pcs cluster auth` fails with "Unable to communicate with $node" when run soon after the pcsd service is started on all nodes. With little delay it works fine.

Version-Release number of selected component (if applicable):
pcs-0.9.41-1.el7.x86_64
kernel-3.9.0-0.55.el7.x86_64
corosync-2.3.0-3.el7.x86_64
pacemaker-1.1.10-1.el7.x86_64
dlm-4.0.1-1.el7.x86_64

How reproducible:
In some test-cases 100%, in others 0%.
This happens in complex setup and I am unable to provide a reproducer, but I am happy to provide you with as much debugging output as you will want.

Steps to Reproduce:

    #!/bin/bash
    # 1. [Running tests. Cluster is not involved. Step 3 is consistently failing after some tests.]
    setpcs() {
        # 2. start pcsd service:
        for nodes in $NODES; do ssh root@$node service pcsd start || return 1; done
        # 3. authenticate:
        for nodes in $NODES; do ssh root@$node pcs cluster auth -u hacluster -p password $NODES || return 1; done
    }
    set -xv
    setpcs

Actual results:
pcs cluster auth failing with:
    Unable to communicate with zaphodc1-node03

Expected results:
pcs cluster auth should pass.

Additional info:

# starting pcsd service on all nodes:
> service pcsd start
Redirecting to /bin/systemctl start  pcsd.service
# resulting in following in /var/log/messages:
May 29 12:07:42 zaphodc1-node03 systemd[1]: Starting PCS GUI...
May 29 12:07:42 zaphodc1-node03 systemd[1]: Started PCS GUI.

# set authentication on all nodes soon after starting services on all nodes:
> pcs cluster auth -u hacluster -p password zaphodc1-node01 zaphodc1-node02 zaphodc1-node03
zaphodc1-node01: Authorized
zaphodc1-node02: Authorized
Unable to communicate with zaphodc1-node03

# With 2s delay it has not failed yet.
# The first run usually takes longer (1-2s) while the second succeeds in almost no time.

Comment 2 Marian Csontos 2013-05-30 10:42:38 UTC
I consulted systemctl manpage (and #systemd to confirm) and `systemctl start SERVICE` should not finish and report success until the service is ready to serve requests (unless --no-block option is given)

Comment 3 Chris Feist 2013-07-01 20:38:55 UTC
This should be fixed upstream with this commit:

https://github.com/feist/pcs/commit/3dfe1589d01b97d583cc675aa4a148029814d5d9

Comment 4 Chris Feist 2013-07-01 22:25:59 UTC
Before fix:

[root@rh7-2 ~]# rpm -q pcs
pcs-0.9.43-1.el7.x86_64
[root@rh7-2 ~]# systemctl stop pcsd
[root@rh7-2 ~]# systemctl start pcsd ; pcs cluster auth rh7-2 ; sleep 5 ; pcs cluster auth rh7-2
Unable to communicate with rh7-2
rh7-2: Already authorized


After fix:
[root@rh7-2 ~]# rpm -q pcs
pcs-0.9.49-3.el7.x86_64
[root@rh7-2 ~]# systemctl stop pcsd
[root@rh7-2 ~]# systemctl start pcsd ; pcs cluster auth rh7-2 ; sleep 5 ; pcs cluster auth rh7-2
rh7-2: Already authorized
rh7-2: Already authorized
[root@rh7-2 ~]#

Comment 8 Ludek Smid 2014-06-13 10:22:02 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.