Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1581184 - After creating and starting 601 volumes, self heal daemon went down and seeing continuous warning messages in glusterd log
After creating and starting 601 volumes, self heal daemon went down and seei...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.4
x86_64 Linux
unspecified Severity high
: ---
: RHGS 3.4.0
Assigned To: Sanju
Bala Konda Reddy M
:
Depends On:
Blocks: 1503137 1589253
  Show dependency treegraph
 
Reported: 2018-05-22 05:56 EDT by Bala Konda Reddy M
Modified: 2018-09-04 02:49 EDT (History)
8 users (show)

See Also:
Fixed In Version: glusterfs-3.12.2-13
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1589253 (view as bug list)
Environment:
Last Closed: 2018-09-04 02:48:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 02:49 EDT

  None (edit)
Description Bala Konda Reddy M 2018-05-22 05:56:22 EDT
Description of problem:
--------------------------------------------------------------------
On a three node cluster, Created and started 600(2X3) volumes. All the bricks and the self-heal daemon is running properly. Then created a new volume of type 2X3, the self-heal daemon stopped running and seeing the continuous warning for every 7 seconds.
---------------------------------------------------------------------
[2018-05-22 09:10:54.352926] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)
[2018-05-22 09:11:01.354185] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)
[2018-05-22 09:11:08.355858] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)
[2018-05-22 09:11:15.358315] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)
[2018-05-22 09:11:22.360205] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)


Version-Release number of selected component (if applicable):
3.12.2-9

How reproducible:
1/1

Steps to Reproduce:
1. On a three node cluster, created 600 volumes of type replicate (2X3) and started them using a script
2. Created a new volume of type replicate 2X3 volume and started it 
3. Volume started successfully

Actual results:
Self-heal daemon went down and seeing continuous warning messages for every 7 seconds as below

[2018-05-22 08:48:09.064406] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)
[2018-05-22 08:48:16.065553] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)
[2018-05-22 08:48:23.066968] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)
[2018-05-22 08:48:30.068186] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)
[2018-05-22 08:48:37.069355] W [socket.c:3266:socket_connect] 0-glustershd: Ignore failed connection attempt on /var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or directory)

Expected results:
Self-heal daemon should be running

Additional info:

[root@dhcp37-214 ~]# gluster vol info deadpool
 
Volume Name: deadpool
Type: Distributed-Replicate
Volume ID: 25cf7f2f-3369-4ffc-8349-ce7c146b9ff2
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.37.214:/bricks/brick0/rel
Brick2: 10.70.37.178:/bricks/brick0/rel
Brick3: 10.70.37.46:/bricks/brick0/rel
Brick4: 10.70.37.214:/bricks/brick1/rel
Brick5: 10.70.37.178:/bricks/brick1/rel
Brick6: 10.70.37.46:/bricks/brick1/rel
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
Comment 16 Bala Konda Reddy M 2018-07-10 10:13:07 EDT
Build: 3.12.2-13

Followed the steps mentioned in the description. Creating (n+1)th volume manually after creating n volumes using the script. Seeing all the processes(brick process and self-heal daemon process) running. No warning messages in the glusterd log. 

Hence marking the bug as verified
Comment 17 errata-xmlrpc 2018-09-04 02:48:11 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.