Bug 1372728

Summary: Node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs.
Product: [Community] GlusterFS Reporter: Kaleb KEITHLEY <kkeithle>
Component: common-haAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.8CC: bugs, jthottan, kkeithle, ndevos, skoduri, sraj
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1363595
: 1373529 (view as bug list) Environment:
Last Closed: 2016-09-13 04:32:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1363595    
Bug Blocks: 1373529    

Description Kaleb KEITHLEY 2016-09-02 13:45:39 UTC
+++ This bug was initially created as a clone of Bug #1363595 +++

Description of problem:

One of the node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs.

Version-Release number of selected component (if applicable):

[root@dhcp41-253 ~]# rpm -qa|grep glusterfs
glusterfs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-cli-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-libs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-client-xlators-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-fuse-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-server-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-geo-replication-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-api-3.8.1-0.4.git56fcf39.el7rhgs.x86_64

[root@dhcp41-253 ~]# rpm -qa|grep ganesha
glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.0-0.14dev26.el7.centos.x86_64
nfs-ganesha-2.4.0-0.14dev26.el7.centos.x86_64

How reproducible:

Observed twice

Steps to Reproduce:
1. Try creating nfs-ganesha cluster on 4 nodes.
2. Observe that sometimes, after gluster nfs-ganesha enable, one of the nodes remains in stopped state in pcs status and below messages are seen in /var/log/messages:

Aug  3 12:22:10 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7257:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:22:25 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7271:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:22:40 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7285:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:22:55 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7326:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:23:10 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7340:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:23:25 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7354:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:23:40 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7368:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]


pcs status output:

4 nodes and 16 resources configured

Online: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ]
     Stopped: [ dhcp41-253.lab.eng.blr.redhat.com ]
 dhcp43-133.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp43-133.lab.eng.blr.redhat.com
 dhcp41-206.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp41-206.lab.eng.blr.redhat.com
 dhcp41-253.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp41-206.lab.eng.blr.redhat.com
 dhcp43-181.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp43-181.lab.eng.blr.redhat.com

Failed Actions:
* nfs-grace_monitor_0 on dhcp41-253.lab.eng.blr.redhat.com 'unknown error' (1): call=17, status=complete, exitreason='none',
    last-rc-change='Tue Aug  2 17:37:52 2016', queued=0ms, exec=55ms


PCSD Status:
  dhcp43-133.lab.eng.blr.redhat.com: Online
  dhcp41-206.lab.eng.blr.redhat.com: Online
  dhcp41-253.lab.eng.blr.redhat.com: Online
  dhcp43-181.lab.eng.blr.redhat.com: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


Actual results:

One of the node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs.

Expected results:

There should not be any errors in logs and all the nodes should be up

Additional info:

sosreports and logs will be attached.

--- Additional comment from Shashank Raj on 2016-08-03 03:10:38 EDT ---

sosreports, ganesha logs and ganesha_mon script can be accessed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1363595

--- Additional comment from Worker Ant on 2016-09-02 09:41:45 EDT ---

REVIEW: http://review.gluster.org/15390 (common-ha: ganesha_mon: line 137: [: too many arguments ]" messages) posted (#1) for review on master by Kaleb KEITHLEY (kkeithle)

Comment 1 Worker Ant 2016-09-06 14:03:11 UTC
REVIEW: http://review.gluster.org/15409 (common-ha: ganesha_mon: line 137: [: too many arguments ]" messages) posted (#1) for review on release-3.8 by Kaleb KEITHLEY (kkeithle)

Comment 2 Worker Ant 2016-09-07 11:49:21 UTC
COMMIT: http://review.gluster.org/15409 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit 2b2b301d6b8a9781525d938e3896069bcfc60909
Author: Kaleb S. KEITHLEY <kkeithle>
Date:   Tue Sep 6 10:01:03 2016 -0400

    common-ha: ganesha_mon: line 137: [: too many arguments ]" messages
    
    ensure that there are always valid, non-null arguments to /bin/test
    
    Here there be dragons. Very racy, but if the races lose, they lose
    in a way that's consistent with what we're testing for anyway, namely
    that the ganesha.nfsd process is gone.
    
    Change-Id: I88b770dd874ffa8576711f8009f27122a4fb0130
    BUG: 1372728
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/15409
    Reviewed-by: Niels de Vos <ndevos>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>

Comment 3 Niels de Vos 2016-09-12 05:40:03 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 4 Niels de Vos 2016-09-16 18:28:44 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.4, please open a new bug report.

glusterfs-3.8.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/announce/2016-September/000060.html
[2] https://www.gluster.org/pipermail/gluster-users/