+++ This bug was initially created as a clone of Bug #979861 +++ glusterd is reported to be not operational by `gluster' command despite glusterd being alive: [root@wingo ~]# gluster volume info No volumes present Connection failed. Please check if gluster daemon is operational. [root@wingo ~]# gluster volume status Connection failed. Please check if gluster daemon is operational. [root@wingo ~]# gluster peer status peer status: failed Connection failed. Please check if gluster daemon is operational. [root@wingo ~]# pgrep glusterd 2751 [root@wingo ~]# [root@wingo ~]# telnet localhost 24007 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. ^] ========================================================================== Setup comprises of four machines: tex, mater, van, wingo I did some volume operations in a gap of 30 seconds each. Approximately 15 set and unset on volume were done. Uploaded sosreports from the machines. They have hostnames as part of the filename. --- Additional comment from krishnan parthasarathi on 2013-07-02 16:42:01 IST --- Root cause: The bug synopsis has a rather apocalyptic tone to what is being observed :-) What I guess is being called as inconsistent, is the following message on stderr, which is generally associated with glusterd service being down, "Connection failed. Please check if gluster daemon is operational" The reason why the CLI prints that message is because it is unable to make RPC(s) to glusterd. This is because (for reasons that will follow) CLI requests are being made from port no. > 1024. glusterd 'drops' such requests. How did the system run out of port no. < 1024? Executing gluster CLI in a loop results in an active close of CLI's TCP connection with glusterd. Actively closed TCP connections go into TCP_WAIT state. What this means to us is, the port is 'held' by the system for upto 2*MSL (2mins). At this rate, we would be piling up TCP connections in TCP_TIME_WAIT state. This is still a serious (transient) resource leak since we would expect monitoring agents to constantly consult glusterd via gluster CLI for volume health and status.
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Use unix domain sockets for cli-glusterd communication) posted (#1) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Use unix domain sockets for cli-glusterd communication) posted (#2) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Use unix domain sockets for cli-glusterd communication) posted (#3) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Use unix domain sockets for cli-glusterd communication) posted (#4) for review on master by Kaushal M (kaushal)
Version : 3.4.0.12rhs.beta3-1.el6rhs.x86_64 Facing the below issue ----------------------- 1) Created a distributed volume and while starting the volume , got the error message that 'volume start failed' , and on trying to start the volume again , it gives the message that the volume has already been started . gluster volume create vol_12 10.70.34.85:/rhs/brick1/A1 10.70.34.105:/rhs/brick1/A2 10.70.34.86:/rhs/brick1/A3 10.70.34.85:/rhs/brick1/A4 10.70.34.105:/rhs/brick1/A5 volume create: vol_12: success: please start the volume to access data [root@fillmore tmp]# gluster v start vol_12 volume start: vol_12: failed: Commit failed on 10.70.34.85. Please check the log file for more details. [root@fillmore tmp]# gluster v start vol_12 volume start: vol_12: failed: Volume vol_12 already started [root@fillmore tmp]# gluster v i vol_12 Volume Name: vol_12 Type: Distribute Volume ID: 570901ec-377a-4690-b81d-8a4824deb797 Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: 10.70.34.85:/rhs/brick1/A1 Brick2: 10.70.34.105:/rhs/brick1/A2 Brick3: 10.70.34.86:/rhs/brick1/A3 Brick4: 10.70.34.85:/rhs/brick1/A4 Brick5: 10.70.34.105:/rhs/brick1/A5 -----------part of log from 10.70.34.85-------------- [2013-07-08 09:17:02.303536] E [rpcsvc.c:519:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request [2013-07-08 09:17:02.314110] E [rpcsvc.c:519:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request [2013-07-08 09:17:02.413361] E [rpcsvc.c:519:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request [2013-07-08 09:17:02.418914] E [rpcsvc.c:519:rpcsvc_handle_rpc_call] 0-glusterd: -------------------------------------------------------
sos reports for comment 5 : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/980754/
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Use unix domain sockets for cli-glusterd communication) posted (#5) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Changes to cli-glusterd communication) posted (#6) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Changes to cli-glusterd communication) posted (#7) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Changes to cli-glusterd communication) posted (#8) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Changes to cli-glusterd communication) posted (#9) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Changes to cli-glusterd communication) posted (#10) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Changes to cli-glusterd communication) posted (#11) for review on master by Kaushal M (kaushal)
REVIEW: http://review.gluster.org/5280 (cli,glusterd: Changes to cli-glusterd communication) posted (#12) for review on master by Kaushal M (kaushal)
COMMIT: http://review.gluster.org/5280 committed in master by Vijay Bellur (vbellur) ------ commit fc637b14cfad4d08e72bee7064194c8007a388d0 Author: Kaushal M <kaushal> Date: Wed Jul 3 16:31:22 2013 +0530 cli,glusterd: Changes to cli-glusterd communication Glusterd changes: With this patch, glusterd creates a socket file in DATADIR/run/glusterd.socket , and listen on it for cli requests. It listens for 2 rpc programs on the socket file, - The glusterd cli rpc program, for all cli commands - A reduced glusterd handshake program, just for the 'system:: getspec' command The location of the socket file can be changed with the glusterd option 'glusterd-sockfile'. To retain compatibility with the '--remote-host' cli option, glusterd also listens for the cli requests on port 24007. But, for the sake of security, it listens using a reduced cli rpc program on the port. The reduced rpc program only contains read-only procs used for 'volume (info|list|status)', 'peer status' and 'system:: getwd' cli commands. CLI changes: The gluster cli now uses the glusterd socket file for communicating with glusterd by default. A new option '--gluster-sock' has been added to allow specifying the sockfile used to connect. Using the '--remote-host' option will make cli connect to the given host & port. Tests changes: cluster.rc has been modified to make use of socket files and use different log files for each glusterd. Some of the tests using cluster.rc have been fixed. Change-Id: Iaf24bc22f42f8014a5fa300ce37c7fc9b1b92b53 BUG: 980754 Signed-off-by: Kaushal M <kaushal> Reviewed-on: http://review.gluster.org/5280 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report. glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user