+++ This bug was initially created as a clone of Bug #1372728 +++ +++ This bug was initially created as a clone of Bug #1363595 +++ Description of problem: One of the node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs. Version-Release number of selected component (if applicable): [root@dhcp41-253 ~]# rpm -qa|grep glusterfs glusterfs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 glusterfs-cli-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 glusterfs-libs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 glusterfs-client-xlators-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 glusterfs-fuse-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 glusterfs-server-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 glusterfs-geo-replication-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 glusterfs-api-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 [root@dhcp41-253 ~]# rpm -qa|grep ganesha glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.0-0.14dev26.el7.centos.x86_64 nfs-ganesha-2.4.0-0.14dev26.el7.centos.x86_64 How reproducible: Observed twice Steps to Reproduce: 1. Try creating nfs-ganesha cluster on 4 nodes. 2. Observe that sometimes, after gluster nfs-ganesha enable, one of the nodes remains in stopped state in pcs status and below messages are seen in /var/log/messages: Aug 3 12:22:10 dhcp41-253 lrmd[645]: notice: nfs-mon_monitor_10000:7257:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ] Aug 3 12:22:25 dhcp41-253 lrmd[645]: notice: nfs-mon_monitor_10000:7271:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ] Aug 3 12:22:40 dhcp41-253 lrmd[645]: notice: nfs-mon_monitor_10000:7285:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ] Aug 3 12:22:55 dhcp41-253 lrmd[645]: notice: nfs-mon_monitor_10000:7326:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ] Aug 3 12:23:10 dhcp41-253 lrmd[645]: notice: nfs-mon_monitor_10000:7340:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ] Aug 3 12:23:25 dhcp41-253 lrmd[645]: notice: nfs-mon_monitor_10000:7354:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ] Aug 3 12:23:40 dhcp41-253 lrmd[645]: notice: nfs-mon_monitor_10000:7368:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ] pcs status output: 4 nodes and 16 resources configured Online: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ] Stopped: [ dhcp41-253.lab.eng.blr.redhat.com ] dhcp43-133.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp43-133.lab.eng.blr.redhat.com dhcp41-206.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp41-206.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp43-181.lab.eng.blr.redhat.com Failed Actions: * nfs-grace_monitor_0 on dhcp41-253.lab.eng.blr.redhat.com 'unknown error' (1): call=17, status=complete, exitreason='none', last-rc-change='Tue Aug 2 17:37:52 2016', queued=0ms, exec=55ms PCSD Status: dhcp43-133.lab.eng.blr.redhat.com: Online dhcp41-206.lab.eng.blr.redhat.com: Online dhcp41-253.lab.eng.blr.redhat.com: Online dhcp43-181.lab.eng.blr.redhat.com: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Actual results: One of the node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs. Expected results: There should not be any errors in logs and all the nodes should be up Additional info: sosreports and logs will be attached. --- Additional comment from Shashank Raj on 2016-08-03 03:10:38 EDT --- sosreports, ganesha logs and ganesha_mon script can be accessed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1363595 --- Additional comment from Worker Ant on 2016-09-02 09:41:45 EDT --- REVIEW: http://review.gluster.org/15390 (common-ha: ganesha_mon: line 137: [: too many arguments ]" messages) posted (#1) for review on master by Kaleb KEITHLEY (kkeithle) --- Additional comment from Worker Ant on 2016-09-06 10:03:11 EDT --- REVIEW: http://review.gluster.org/15409 (common-ha: ganesha_mon: line 137: [: too many arguments ]" messages) posted (#1) for review on release-3.8 by Kaleb KEITHLEY (kkeithle)
REVIEW: http://review.gluster.org/15411 (common-ha: ganesha_mon: line 137: [: too many arguments ]" messages) posted (#1) for review on release-3.9 by Kaleb KEITHLEY (kkeithle)
COMMIT: http://review.gluster.org/15411 committed in release-3.9 by Niels de Vos (ndevos) ------ commit d955d8ff5c5cc015fb626110c173b0a7d45dc0ed Author: Kaleb S. KEITHLEY <kkeithle> Date: Tue Sep 6 10:17:25 2016 -0400 common-ha: ganesha_mon: line 137: [: too many arguments ]" messages ensure that there are always valid, non-null arguments to /bin/test Here there be dragons. Very racy, but if the races lose, they lose in a way that's consistent with what we're testing for anyway, namely that the ganesha.nfsd process is gone. Change-Id: I88b770dd874ffa8576711f8009f27122a4fb0130 BUG: 1373529 Signed-off-by: Kaleb S. KEITHLEY <kkeithle> Reviewed-on: http://review.gluster.org/15411 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Niels de Vos <ndevos> Smoke: Gluster Build System <jenkins.org>