Description of problem: -------------------------------------------------------------------- On a three node cluster, Created and started 600(2X3) volumes. All the bricks and the self-heal daemon is running properly. Then created a new volume of type 2X3, the self-heal daemon stopped running and seeing the continuous warning for every 7 seconds. --------------------------------------------------------------------- [2018-05-22 09:10:54.352926] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 09:11:01.354185] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 09:11:08.355858] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 09:11:15.358315] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 09:11:22.360205] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) Version-Release number of selected component (if applicable): 3.12.2-9 How reproducible: 1/1 Steps to Reproduce: 1. On a three node cluster, created 600 volumes of type replicate (2X3) and started them using a script 2. Created a new volume of type replicate 2X3 volume and started it 3. Volume started successfully Actual results: Self-heal daemon went down and seeing continuous warning messages for every 7 seconds as below [2018-05-22 08:48:09.064406] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 08:48:16.065553] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 08:48:23.066968] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 08:48:30.068186] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) [2018-05-22 08:48:37.069355] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory) Expected results: Self-heal daemon should be running Additional info: [root@dhcp37-214 ~]# gluster vol info deadpool Volume Name: deadpool Type: Distributed-Replicate Volume ID: 25cf7f2f-3369-4ffc-8349-ce7c146b9ff2 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.70.37.214:/bricks/brick0/rel Brick2: 10.70.37.178:/bricks/brick0/rel Brick3: 10.70.37.46:/bricks/brick0/rel Brick4: 10.70.37.214:/bricks/brick1/rel Brick5: 10.70.37.178:/bricks/brick1/rel Brick6: 10.70.37.46:/bricks/brick1/rel Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off
Build: 3.12.2-13 Followed the steps mentioned in the description. Creating (n+1)th volume manually after creating n volumes using the script. Seeing all the processes(brick process and self-heal daemon process) running. No warning messages in the glusterd log. Hence marking the bug as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607