Bug 1222065
Summary: | GlusterD fills the logs when the NFS-server is disabled | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Niels de Vos <ndevos> | ||||
Component: | glusterd | Assignee: | krishnan parthasarathi <kparthas> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.7.0 | CC: | alex, amukherj, bugs, enrico.tagliavini, gluster-bugs, kaushal, kparthas, manu, matorola, nsathyan, plarsen | ||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
URL: | http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10956 | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.7.2 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1222378 1227179 (view as bug list) | Environment: | |||||
Last Closed: | 2015-07-06 13:53:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1222378 | ||||||
Bug Blocks: | 1227179, 1233025 | ||||||
Attachments: |
|
Description
Niels de Vos
2015-05-15 16:37:09 UTC
COMMIT: http://review.gluster.org/10963 committed in release-3.7 by Kaushal M (kaushal) ------ commit efdb183c62be68f1a371237d1caac4347288f538 Author: Krishnan Parthasarathi <kparthas> Date: Tue May 19 14:48:01 2015 +0530 glusterd: fix repeated connection to nfssvc failed msgs ... and disable reconnect timer on rpc_clnt_disconnect. Root Cause ---------- gluster-NFS service wouldn't be started if there are no started volumes that have nfs service enabled for them. Before this fix we would initiate a connect even when the gluster-NFS service wasn't (re)started. Compounding that glusterd_conn_disconnect doesn't disable reconnect timer. So, it is possible that the reconnect timer was in execution when the timer event was attempted to be removed. Change-Id: Iadcb5cff9eafefa95eaf3a1a9413eeb682d3aaac BUG: 1222065 Signed-off-by: Krishnan Parthasarathi <kparthas> Reviewed-on: http://review.gluster.org/10830 Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Gaurav Kumar Garg <ggarg> Reviewed-by: Kaushal M <kaushal> Reviewed-on: http://review.gluster.org/10963 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Emmanuel Dreyfus <manu> The required changes to fix this bug have not made it into glusterfs-3.7.1. This bug is now getting tracked for glusterfs-3.7.2. This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.2, please reopen this bug report. glusterfs-3.7.2 has been announced on the Gluster Packaging mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/packaging/2015-June/000006.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user [root@node01 mator]# rpm -qa glusterfs* glusterfs-server-3.7.2-3.fc22.x86_64 glusterfs-fuse-3.7.2-3.fc22.x86_64 glusterfs-libs-3.7.2-3.fc22.x86_64 glusterfs-rdma-3.7.2-3.fc22.x86_64 glusterfs-cli-3.7.2-3.fc22.x86_64 glusterfs-client-xlators-3.7.2-3.fc22.x86_64 glusterfs-geo-replication-3.7.2-3.fc22.x86_64 glusterfs-api-3.7.2-3.fc22.x86_64 glusterfs-3.7.2-3.fc22.x86_64 [root@node01 mator]# gluster vol info vol2 Volume Name: vol2 Type: Distribute Volume ID: 070d64bd-a120-43fb-b524-fb85403343ec Status: Stopped Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: node01:/export/sdb1/brick2 Options Reconfigured: nfs.disable: on no repeated nfs socket related messages in logs so far... lets try to play a bit: [root@node01 mator]# gluster vol set vol2 nfs.disable off volume set: success [root@node01 mator]# gluster vol info vol2 Volume Name: vol2 Type: Distribute Volume ID: 070d64bd-a120-43fb-b524-fb85403343ec Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: node01:/export/sdb1/brick2 Options Reconfigured: nfs.disable: off # tail -f /var/log/glusterfs/etc-glusterfs-glusterd.vol.log [2015-06-25 08:31:02.835985] I [glusterd-utils.c:4096:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2015-06-25 08:31:02.870606] I [run.c:190:runner_log] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x147)[0x7f08802c9b87] (--> /lib64/li bglusterfs.so.0(runner_log+0xf6)[0x7f08803146e6] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x51d)[0x 7f0874e3403d] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(+0xd6512)[0x7f0874e34512] (--> /lib64/libpthread.so.0(+0x7555)[0x7f08 7f12d555] ))))) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=vol2 -o nfs.disable=off --gd-workdir =/var/lib/glusterd [2015-06-25 08:31:02.885468] I [run.c:190:runner_log] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x147)[0x7f08802c9b87] (--> /lib64/li bglusterfs.so.0(runner_log+0xf6)[0x7f08803146e6] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x51d)[0x 7f0874e3403d] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(+0xd6512)[0x7f0874e34512] (--> /lib64/libpthread.so.0(+0x7555)[0x7f08 7f12d555] ))))) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=vol2 -o nfs.disa ble=off --gd-workdir=/var/lib/glusterd [2015-06-25 08:31:06.109914] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/5cf388ba8b9610c52604425a16955178.socket failed (Invalid argument) [2015-06-25 08:31:09.110526] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/5cf388ba8b9610c52604425a16955178.socket failed (Invalid argument) [2015-06-25 08:31:09.942705] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-06-25 08:31:09.943677] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-06-25 08:31:09.944671] I [glusterd-handler.c:1404:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2015-06-25 08:31:12.111402] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/5cf388ba8b9610c52604425a16955178.socket failed (Invalid argument) [2015-06-25 08:31:15.112063] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/5cf388ba8b9610c52604425a16955178.socket failed (Invalid argument) ... The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 13 times between [2015-06-25 08:29:46.906695] and [2015-06-25 08:31:42.117820] ... [2015-06-25 08:33:42.145842] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/5cf388ba8b9610c52604425a16955178.socket failed (Invalid argument) The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2015-06-25 08:31:45.118640] and [2015-06-25 08:33:42.145947] missed "gluster vol start vol2" in previous message. can someone reopen this bug please? thanks. 11:45 < mator> ndevos, how do i reopen bug in bugzilla.redhat.com ? https://bugzilla.redhat.com/show_bug.cgi?id=1222065 11:45 <@glusterbot> Bug 1222065: high, high, ---, kparthas, CLOSED CURRENTRELEASE, GlusterD fills the logs when the NFS-server is disabled 11:47 < ndevos> mator: are you still hitting that bug with glusterfs-3.7.2? the exact same problem? 11:48 < mator> ndevos, yeah Please attach current logs and the exact version of the packages that you used. Thanks! version info per my previous message comment, logs in attachment. Thanks. Created attachment 1044665 [details]
glusterfs log node01
file logs attachment, it has upgrade to 3.7.2 and NFS related logs, from node01
(In reply to Anatoly Pugachev from comment #10) > Created attachment 1044665 [details] > glusterfs log node01 > > file logs attachment, it has upgrade to 3.7.2 and NFS related logs, from > node01 IIUC, you are seeing repeated log messages when nfs.disable is to "off" on vol2. As a side effect of setting nfs.disable to off, we have at least one volume (i.e, vol2) which is started and has nfs service enabled. This tells glusterd that it has to start the NFS service if it's not running. I wonder why glusterd is failing to 'reach' the gluster NFS process via the unix domain socket file mentioned in the log messages. Could you check if the gluster NFS process is running on the node where you see these log messages in glusterd's log file? It would help if you could attach the gluster nfs and glusterd log files. Well, I was unable to reproduce, like in previous attempt, with glusterfs-3.7.2. We could close this bugreport as solved now. Thanks. Thanks for the confirmation! With 3.6.4 this is still happening Still happening. glusterfs 3.7.5 built on Oct 7 2015 16:27:17 [2015-11-05 04:06:29.300039] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/0bba59acfb6bd5aaa82b3079be6b8686.socket failed (Invalid argument) Please fix. Thanks! (In reply to Alex Atkinson from comment #15) > Still happening. glusterfs 3.7.5 built on Oct 7 2015 16:27:17 > > [2015-11-05 04:06:29.300039] W [socket.c:588:__socket_rwv] 0-nfs: readv on > /var/run/gluster/0bba59acfb6bd5aaa82b3079be6b8686.socket failed (Invalid > argument) > > Please fix. Thanks! Could you please provide the reproducer steps here. Is it exactly the same what the BZ had begun with? Simple setup. Defaults. Two hosts, each with an ext4 mount as the brick. Replication 2. No stripes. Mounted on client with mount.glusterfs. rpm -q glusterfs-server glusterfs-server-3.7.5-19.el7rhgs.x86_64 Same issue. Single node/brick setup - nfs-server is already running on server outside of gluster, but other than that - simple ext4 volume being used by RHEV. Clearing NEEDINFO |