Bug 1337495

Summary: [Volume Scale] gluster node randomly going to Disconnected state after scaling to more than 290 gluster volumes
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasanth <pprakash>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED ERRATA QA Contact: Prasanth <pprakash>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: annair, asrivast, pousley, pprakash, rcyriac, rhinduja, rhs-bugs, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 05:32:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351522    

Description Prasanth 2016-05-19 10:21:20 UTC
Description of problem:

[Volume Scale for Aplo] 
gluster node randomly going to Disconnected state after scaling to more than 290 gluster volumes. This in turn might had affected the creation of subsequent volumes after 290.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-5.el7rhgs.x86_64
glusterfs-server-3.7.9-5.el7rhgs.x86_64

How reproducible: Mostly


Steps to Reproduce:
1. A gluster cluster of 4 RHGS 3.1.3 nodes having glusterd.service MemoryLimit=32G
2. Using heketi-cli, try to create and start around 300 gluster volumes in a loop
for i in {1..300}; do heketi-cli volume create --name=vol$i --size=10 --durability="replicate" --replica=3; done
3. Check for command output and heketi logs


Actual results: While it was trying to start vol291 after it's creation, it took a while. During this time, # gluster pool list was showing one of the node in disconnected state even though glusterd was running on it.

------------
[root@dhcp42-85 ~]#  systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
           └─50-MemoryLimit.conf
   Active: active (running) since Wed 2016-05-18 19:35:20 IST; 5h 14min ago
  Process: 15144 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 15145 (glusterd)
   Memory: 29.4G (limit: 32.0G)
   CGroup: /system.slice/glusterd.service
------------

########
[root@dhcp43-158 ~]# gluster pool list
UUID                                    Hostname        State
4b494dd7-09e6-4d7d-8834-218534548912    10.70.42.222    Connected 
36c62d2f-7baa-4138-8697-9509bf249d47    10.70.42.85     Disconnected 
7fb3cbba-b377-4657-9a2a-d21f9c115388    10.70.43.162    Connected 
3b6f62bd-4df1-40b1-9ec5-fed8018d7816    localhost       Connected 

[root@dhcp43-158 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.42.222
Uuid: 4b494dd7-09e6-4d7d-8834-218534548912
State: Peer in Cluster (Connected)
Other names:
dhcp42-222.lab.eng.blr.redhat.com

Hostname: 10.70.42.85
Uuid: 36c62d2f-7baa-4138-8697-9509bf249d47
State: Peer in Cluster (Disconnected)

Hostname: 10.70.43.162
Uuid: 7fb3cbba-b377-4657-9a2a-d21f9c115388
State: Peer in Cluster (Connected)

[root@dhcp42-85 ~]# gluster pool list
Error : Request timed out

However, the other nodes were showing it as Connected.
########

Expected results: gluster node should not go into disconnected state while the glusterd service is up and running.

Comment 2 Atin Mukherjee 2016-05-20 03:48:50 UTC
RCA for this goes same as BZ 1336267

Comment 4 Atin Mukherjee 2016-07-08 04:22:42 UTC
http://review.gluster.org/#/c/14849/ fixes this issue too.

Comment 6 Atin Mukherjee 2016-09-17 14:25:51 UTC
Upstream mainline : http://review.gluster.org/14849
Upstream 3.8 : http://review.gluster.org/14860

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 12 Prasanth 2017-02-28 07:28:46 UTC
The reported issue seems to be fixed in glusterfs-3.8.4 and I was able to scale gluster volumes using heketi-cli even beyond 300 volumes and the gluster nodes are no longer going into disconnected state.

######################
# gluster --version
glusterfs 3.8.4 built on Feb 20 2017 03:15:38
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


# heketi-cli volume list |wc -l
500


# gluster volume list |wc -l
500


# gluster peer status
Number of Peers: 2

Hostname: dhcp46-150.lab.eng.blr.redhat.com
Uuid: c4bdf1ad-04ab-4301-b9fe-f144272079ef
State: Peer in Cluster (Connected)

Hostname: 10.70.47.163
Uuid: fcd44049-a3b9-4f85-851c-79915812cf3f
State: Peer in Cluster (Connected)


# gluster volume info vol291
 
Volume Name: vol291
Type: Replicate
Volume ID: 48eeed36-e50e-429b-b474-10e4e336ffca
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.150:/var/lib/heketi/mounts/vg_1dae8c1c2feb1c16338a0440f64bcfed/brick_6f278e35a09ea7cff70b008784cb99c1/brick
Brick2: 10.70.47.161:/var/lib/heketi/mounts/vg_cf48e4fe475f69149d157bbfae86db75/brick_5cc9403c371cff1f2c4b6504bca5f2e9/brick
Brick3: 10.70.47.163:/var/lib/heketi/mounts/vg_94e77f6c32ac54b0c819ceee0899981f/brick_a6fb045df6a79ee353031f9da40309c5/brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on


# gluster volume info vol500
 
Volume Name: vol500
Type: Replicate
Volume ID: 72c9be57-3284-4e12-8449-84889a782c23
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.163:/var/lib/heketi/mounts/vg_814d6fa82429363cf09aaafe5ba7d850/brick_1f6b1df64468e950ee64aebe57f498d9/brick
Brick2: 10.70.46.150:/var/lib/heketi/mounts/vg_60357bad265972bb79fd3155e6d473fa/brick_d14fc99faef482004d606480d405df7d/brick
Brick3: 10.70.47.161:/var/lib/heketi/mounts/vg_893e765b342697a5086bf56d58332501/brick_3f4e9c68a58d6435ab377d85d08f90ff/brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

# gluster pool list
UUID                                    Hostname                                State
c4bdf1ad-04ab-4301-b9fe-f144272079ef    dhcp46-150.lab.eng.blr.redhat.com       Connected 
fcd44049-a3b9-4f85-851c-79915812cf3f    10.70.47.163                            Connected 
a614a3d8-478f-48d4-8542-e6d9c3b526ad    localhost                               Connected 
######################

Comment 15 errata-xmlrpc 2017-03-23 05:32:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html