Bug 1415072 - [NFS-Ganesha] Too many nfs-mon_monitor_10000 error messages seen in /var/log/messages
Summary: [NFS-Ganesha] Too many nfs-mon_monitor_10000 error messages seen in /var/log/...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: common-ha
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
: ---
Assignee: Kaleb KEITHLEY
QA Contact: Arthy Loganathan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-20 07:29 UTC by Arthy Loganathan
Modified: 2017-08-21 12:42 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-21 12:42:52 UTC
Embargoed:


Attachments (Terms of Use)

Description Arthy Loganathan 2017-01-20 07:29:57 UTC
Description of problem:
Too many nfs-mon_monitor_10000 error messages seen in /var/log/messages

n 20 12:48:04 dhcp46-111 lrmd[22195]:  notice: nfs-mon_monitor_10000:31431:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:48:20 dhcp46-111 lrmd[22195]:  notice: nfs-mon_monitor_10000:31462:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]

Also, the machine got failovered to other node, even though pacemaker and ganesha services are up. Not sure, this issue is seen because of these error messages.

Version-Release number of selected component (if applicable):
nfs-ganesha-gluster-2.4.1-6.el7rhgs.x86_64
nfs-ganesha-2.4.1-6.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-12.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a ganesha cluster and volume.
2. Enable nfs-ganesha on the volume and mount it.
3. Run IOs from mount point.

Actual results:
Too many nfs-mon_monitor_10000 error messages seen in /var/log/messages

Expected results:
No such error messages should seen.

Additional info:

[root@dhcp46-111 ~]# service pacemaker status -l
Redirecting to /bin/systemctl status  -l pacemaker.service
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2017-01-18 15:24:44 IST; 1 day 21h ago
     Docs: man:pacemakerd
           http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html
 Main PID: 22191 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─ 6654 /bin/bash /usr/lib/ocf/resource.d/heartbeat/ganesha_mon monitor
           ├─ 6666 sleep 5
           ├─22191 /usr/sbin/pacemakerd -f
           ├─22193 /usr/libexec/pacemaker/cib
           ├─22194 /usr/libexec/pacemaker/stonithd
           ├─22195 /usr/libexec/pacemaker/lrmd
           ├─22196 /usr/libexec/pacemaker/attrd
           ├─22197 /usr/libexec/pacemaker/pengine
           └─22198 /usr/libexec/pacemaker/crmd

Jan 20 12:56:10 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6418:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:56:25 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6432:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:56:40 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6474:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:56:55 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6488:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:57:11 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6502:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:57:26 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6516:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:57:41 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6558:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:57:56 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6572:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:58:11 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6586:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]
Jan 20 12:58:26 dhcp46-111.lab.eng.blr.redhat.com lrmd[22195]:   notice: nfs-mon_monitor_10000:6600:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 142: [: too many arguments ]

[root@dhcp46-111 ~]# service nfs-ganesha status
Redirecting to /bin/systemctl status  nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-01-19 18:25:16 IST; 16h ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 32563 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
  Process: 415 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS)
  Process: 407 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS)
 Main PID: 411 (ganesha.nfsd)
   CGroup: /system.slice/nfs-ganesha.service
           └─411 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 637729...

Jan 19 18:25:16 dhcp46-111.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Ganesha file server...
Jan 19 18:25:16 dhcp46-111.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganesha file server.
[root@dhcp46-111 ~]# pcs status
Cluster name: ganesha-ha-360
Stack: corosync
Current DC: dhcp46-111.lab.eng.blr.redhat.com (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Fri Jan 20 12:47:17 2017		Last change: Fri Jan 20 11:20:53 2017 by root via crm_attribute on dhcp46-124.lab.eng.blr.redhat.com

4 nodes and 24 resources configured

Online: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
     Stopped: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com ]
 Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group
     dhcp46-111.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp46-139.lab.eng.blr.redhat.com
     dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp46-139.lab.eng.blr.redhat.com
     dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp46-139.lab.eng.blr.redhat.com
 Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group
     dhcp46-115.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp46-115.lab.eng.blr.redhat.com
     dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp46-115.lab.eng.blr.redhat.com
     dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp46-115.lab.eng.blr.redhat.com
 Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group
     dhcp46-139.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp46-139.lab.eng.blr.redhat.com
     dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp46-139.lab.eng.blr.redhat.com
     dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp46-139.lab.eng.blr.redhat.com
 Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group
     dhcp46-124.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp46-139.lab.eng.blr.redhat.com
     dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp46-139.lab.eng.blr.redhat.com
     dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp46-139.lab.eng.blr.redhat.com

Failed Actions:
* nfs-grace_monitor_5000 on dhcp46-139.lab.eng.blr.redhat.com 'not running' (7): call=145, status=complete, exitreason='none',
    last-rc-change='Thu Jan 19 23:19:55 2017', queued=0ms, exec=0ms
* nfs-grace_monitor_5000 on dhcp46-111.lab.eng.blr.redhat.com 'not running' (7): call=208, status=complete, exitreason='none',
    last-rc-change='Thu Jan 19 18:24:49 2017', queued=0ms, exec=0ms
* nfs-mon_monitor_10000 on dhcp46-111.lab.eng.blr.redhat.com 'unknown error' (1): call=13, status=Timed Out, exitreason='none',
    last-rc-change='Thu Jan 19 12:51:12 2017', queued=0ms, exec=0ms


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
s

Comment 2 Arthy Loganathan 2017-01-20 07:34:52 UTC
sosreports and ganesha logs are at,
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1415072/


Note You need to log in before you can comment on or make changes to this bug.