Bug 1048042

Summary: gluster commands fail as glusterd doesn't detect network failure, when all outbound traffic to other glusterd's are dropped
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: glusterfsAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED NOTABUG QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: grajaiya, kparthas, nsathyan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: glusterd
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-03 13:49:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreports none

Description SATHEESARAN 2014-01-03 01:29:13 UTC
Description of problem:
-----------------------
gluster cli commands fail with,'Connection failed. Please check if gluster daemon is operational', as glusterd doesn't detect network failure, when all outbound network traffic from that glusterd to all other glusterd were dropped

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.4.0.51geo.el6rhs [ hotfix for RHSS 2.1 Update1 ]

How reproducible:
-----------------
Happened all the times

Steps to Reproduce:
-------------------
1. Create a trusted storage pool of 'N' RHSS Nodes
2. In one of the RHSS Node, drop all outbound glusterd traffic
3. Execute any 'gluster cli' command

Actual results:
---------------
All gluster cli commands fail with,'Connection failed. Please check if gluster daemon is operational', though glusterd is up and kicking

Expected results:
-----------------
1. glusterd should detect the above scenario as network disconnect with the help of heart-beating mechanism

2. gluster cli commands should not fail with,'Connection failed. Please check if gluster daemon is operational', though glusterd is up and kicking

Additional info:
----------------

SETUP INFO
===========
1. RHSS 2.1 U1 ISO - RHSS-2.1-20131122.0-RHS-x86_64-DVD1.iso
2. Packages - 
   glusterfs-3.4.0.51geo.el6rhs - http://download.devel.redhat.com/brewroot/packages/glusterfs/3.4.0.51geo/1.el6rhs/x86_64/
3. Provisioning - 2 RHSS VMs through Beaker
4. 2 Node cluster was created
(i.e) gluster peer probe <RHSS-NODE>
5. No volumes were created

STEPS
=====
1. Stop all outbound glusterd traffic from one of the RHSS Node using iptables
(i.e) iptables -I OUTPUT -p tcp --dport 24007 -j DROP

Above command drops all output packets destined for 24007 ( all glusterd listens )

2. Execute any gluster cli command from the same node
(.e) gluster volume status

CONSOLE LOGS
=============
[Thu Jan  2 23:00:33 UTC 2014 root.37.7:~ ] # iptables -I OUTPUT -p tcp --dport 24007 -j DROP

[Thu Jan  2 23:00:40 UTC 2014 root.37.7:~ ] # iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  anywhere             anywhere            tcp dpt:24007 

[Thu Jan  2 23:01:12 UTC 2014 root.37.7:~ ] # gluster peer status
Connection failed. Please check if gluster daemon is operational.

[Thu Jan  2 23:03:14 UTC 2014 root.37.7:~ ] # service glusterd status
glusterd (pid  6797) is running...

[Thu Jan  2 23:04:34 UTC 2014 root.37.7:~ ] # gluster volume status
Connection failed. Please check if gluster daemon is operational.

Comment 1 SATHEESARAN 2014-01-03 01:52:00 UTC
Created attachment 844788 [details]
sosreports

sosreports from 2 RHSS Nodes

Comment 2 Vivek Agarwal 2014-02-20 08:36:30 UTC
adding 3.0 flag and removing 2.1.z

Comment 5 SATHEESARAN 2015-08-03 13:49:30 UTC
This bug is no longer reproducible and closing this bug.