Description of problem: nfs-ganesha cluster creation fails during pcs authentication due to timing issue. Version-Release number of selected component (if applicable): gdeploy-2.0.1-3.el7rhgs.noarch How reproducible: Always Steps to Reproduce: 1. Setup nfs-ganesha using gdeploy. Actual results: nfs-ganesha cluster creation fails during pcs authentication Expected results: nfs-ganesha cluster should get created successfully. Additional info: TASK [Pcs cluster authenticate the hacluster users in all the nodes] *********** changed: [dhcp46-139.lab.eng.blr.redhat.com] => (item=dhcp46-111.lab.eng.blr.redhat.com) changed: [dhcp46-115.lab.eng.blr.redhat.com] => (item=dhcp46-111.lab.eng.blr.redhat.com) failed: [dhcp46-124.lab.eng.blr.redhat.com] (item=dhcp46-111.lab.eng.blr.redhat.com) => {"changed": true, "cmd": "pcs cluster auth -u hacluster -p hacluster dhcp46-111.lab.eng.blr.redhat.com", "delta": "0:00:02.100926", "end": "2016-11-09 17:15:22.376325", "failed": true, "item": "dhcp46-111.lab.eng.blr.redhat.com", "rc": 1, "start": "2016-11-09 17:15:20.275399", "stderr": "Error: Some nodes had a newer tokens than the local node. Local node's tokens were updated. Please repeat the authentication if needed.\nError: Unable to communicate with pcsd", "stdout": "", "stdout_lines": [], "warnings": []} changed: [dhcp46-111.lab.eng.blr.redhat.com] => (item=dhcp46-111.lab.eng.blr.redhat.com) changed: [dhcp46-115.lab.eng.blr.redhat.com] => (item=dhcp46-115.lab.eng.blr.redhat.com) changed: [dhcp46-124.lab.eng.blr.redhat.com] => (item=dhcp46-115.lab.eng.blr.redhat.com) failed: [dhcp46-139.lab.eng.blr.redhat.com] (item=dhcp46-115.lab.eng.blr.redhat.com) => {"changed": true, "cmd": "pcs cluster auth -u hacluster -p hacluster dhcp46-115.lab.eng.blr.redhat.com", "delta": "0:00:01.840204", "end": "2016-11-09 17:15:24.252541", "failed": true, "item": "dhcp46-115.lab.eng.blr.redhat.com", "rc": 1, "start": "2016-11-09 17:15:22.412337", "stderr": "Error: Some nodes had a newer tokens than the local node. Local node's tokens were updated. Please repeat the authentication if needed.\nError: Unable to communicate with pcsd", "stdout": "", "stdout_lines": [], "warnings": []}
Arthy can you please double check if the proposed solution of pausing a couple of seconds work?
The actual issue is due to: https://bugzilla.redhat.com/show_bug.cgi?id=1265925 (Thanks to Arthy) This is beyond the scope of gdeploy. However, adding a sleep for 2-3 seconds will reduce the error frequency.
Commit: https://github.com/gluster/gdeploy/pull/216/commits/e47fad6c fixes the issue.
I have tried with the proposed solution, and the frequency of getting error is reduced(hit 1/5 times).
Verified the fix in build, gdeploy-2.0.1-4.el7rhgs.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0482.html