Description of problem: -------------------------------------------------------------------- On a three node cluster, Created and started 600(2X3) volumes. All the bricks and the self-heal daemon is running properly. Then created a new volume of type 2X3, the self-heal daemon stopped running and seeing the continuous warning for every 7 seconds. --------------------------------------------------------------------- [2018-05-22 09:10:54.352926] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 09:11:01.354185] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 09:11:08.355858] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 09:11:15.358315] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 09:11:22.360205] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) Version-Release number of selected component (if applicable): How reproducible: 1/1 Steps to Reproduce: 1. On a three node cluster, created 600 volumes of type replicate (2X3) and started them using a script 2. Created a new volume of type replicate 2X3 volume and started it 3. Volume started successfully Actual results: Self-heal daemon went down and seeing continuous warning messages for every 7 seconds as below [2018-05-22 08:48:09.064406] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 08:48:16.065553] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 08:48:23.066968] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 08:48:30.068186] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 08:48:37.069355] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) Expected results: Self-heal daemon should be running Additional info: [root@dhcp37-214 ~]# gluster vol info deadpool Volume Name: deadpool Type: Distributed-Replicate Volume ID: 25cf7f2f-3369-4ffc-8349-ce7c146b9ff2 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.70.37.214:/bricks/brick0/rel Brick2: 10.70.37.178:/bricks/brick0/rel Brick3: 10.70.37.46:/bricks/brick0/rel Brick4: 10.70.37.214:/bricks/brick1/rel Brick5: 10.70.37.178:/bricks/brick1/rel Brick6: 10.70.37.46:/bricks/brick1/rel Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off
REVIEW: https://review.gluster.org/20197 (glusterd: Fix for shd status) posted (#1) for review on master by Sanju Rakonde
COMMIT: https://review.gluster.org/20197 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: Fix for shd not coming up Problem: After creating and starting n(n is large) distribute-replicated volumes using a script, if we create and start (n+1)th distribute-replicate volume manually self heal daemon is down. Solution: In glusterd_proc_stop after giving SIGTERM signal if the process is still running, we are giving a SIGKILL. As SIGKILL will not perform any cleanup process, we need to remove the pidfile. Fixes: bz#1589253 Change-Id: I7c114334eec74c8d0f21b3e45cf7db6b8ef28af1 Signed-off-by: Sanju Rakonde <srakonde>
REVIEW: https://review.gluster.org/20277 (glusterd: removing the unnecessary glusterd message) posted (#1) for review on master by Sanju Rakonde
COMMIT: https://review.gluster.org/20277 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: removing the unnecessary glusterd message Fixes: bz#1589253 Change-Id: I5510250a3d094e19e471b3ee47bf13ea9ee8aff5 Signed-off-by: Sanju Rakonde <srakonde>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report. glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html [2] https://www.gluster.org/pipermail/gluster-users/