Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
`pcs cluster auth` fails with "Unable to communicate with $node" when run soon after the pcsd service is started on all nodes. With little delay it works fine.
Version-Release number of selected component (if applicable):
pcs-0.9.41-1.el7.x86_64
kernel-3.9.0-0.55.el7.x86_64
corosync-2.3.0-3.el7.x86_64
pacemaker-1.1.10-1.el7.x86_64
dlm-4.0.1-1.el7.x86_64
How reproducible:
In some test-cases 100%, in others 0%.
This happens in complex setup and I am unable to provide a reproducer, but I am happy to provide you with as much debugging output as you will want.
Steps to Reproduce:
#!/bin/bash
# 1. [Running tests. Cluster is not involved. Step 3 is consistently failing after some tests.]
setpcs() {
# 2. start pcsd service:
for nodes in $NODES; do ssh root@$node service pcsd start || return 1; done
# 3. authenticate:
for nodes in $NODES; do ssh root@$node pcs cluster auth -u hacluster -p password $NODES || return 1; done
}
set -xv
setpcs
Actual results:
pcs cluster auth failing with:
Unable to communicate with zaphodc1-node03
Expected results:
pcs cluster auth should pass.
Additional info:
# starting pcsd service on all nodes:
> service pcsd start
Redirecting to /bin/systemctl start pcsd.service
# resulting in following in /var/log/messages:
May 29 12:07:42 zaphodc1-node03 systemd[1]: Starting PCS GUI...
May 29 12:07:42 zaphodc1-node03 systemd[1]: Started PCS GUI.
# set authentication on all nodes soon after starting services on all nodes:
> pcs cluster auth -u hacluster -p password zaphodc1-node01 zaphodc1-node02 zaphodc1-node03
zaphodc1-node01: Authorized
zaphodc1-node02: Authorized
Unable to communicate with zaphodc1-node03
# With 2s delay it has not failed yet.
# The first run usually takes longer (1-2s) while the second succeeds in almost no time.
I consulted systemctl manpage (and #systemd to confirm) and `systemctl start SERVICE` should not finish and report success until the service is ready to serve requests (unless --no-block option is given)
This request was resolved in Red Hat Enterprise Linux 7.0.
Contact your manager or support representative in case you have further questions about the request.
Description of problem: `pcs cluster auth` fails with "Unable to communicate with $node" when run soon after the pcsd service is started on all nodes. With little delay it works fine. Version-Release number of selected component (if applicable): pcs-0.9.41-1.el7.x86_64 kernel-3.9.0-0.55.el7.x86_64 corosync-2.3.0-3.el7.x86_64 pacemaker-1.1.10-1.el7.x86_64 dlm-4.0.1-1.el7.x86_64 How reproducible: In some test-cases 100%, in others 0%. This happens in complex setup and I am unable to provide a reproducer, but I am happy to provide you with as much debugging output as you will want. Steps to Reproduce: #!/bin/bash # 1. [Running tests. Cluster is not involved. Step 3 is consistently failing after some tests.] setpcs() { # 2. start pcsd service: for nodes in $NODES; do ssh root@$node service pcsd start || return 1; done # 3. authenticate: for nodes in $NODES; do ssh root@$node pcs cluster auth -u hacluster -p password $NODES || return 1; done } set -xv setpcs Actual results: pcs cluster auth failing with: Unable to communicate with zaphodc1-node03 Expected results: pcs cluster auth should pass. Additional info: # starting pcsd service on all nodes: > service pcsd start Redirecting to /bin/systemctl start pcsd.service # resulting in following in /var/log/messages: May 29 12:07:42 zaphodc1-node03 systemd[1]: Starting PCS GUI... May 29 12:07:42 zaphodc1-node03 systemd[1]: Started PCS GUI. # set authentication on all nodes soon after starting services on all nodes: > pcs cluster auth -u hacluster -p password zaphodc1-node01 zaphodc1-node02 zaphodc1-node03 zaphodc1-node01: Authorized zaphodc1-node02: Authorized Unable to communicate with zaphodc1-node03 # With 2s delay it has not failed yet. # The first run usually takes longer (1-2s) while the second succeeds in almost no time.