Bug 1473168

Summary: [Gluster-block] VM hangs, if duplicate IPs are given in 'gluster-block create'
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sweta Anandpara <sanandpa>
Component: gluster-blockAssignee: Prasanna Kumar Kalever <prasanna.kalever>
Status: CLOSED UPSTREAM QA Contact: Rahul Hinduja <rhinduja>
Severity: low Docs Contact:
Priority: medium    
Version: cns-3.9CC: atumball, kramdoss, pkarampu, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-19 08:20:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sweta Anandpara 2017-07-20 07:33:52 UTC
Description of problem:
=======================
Had a 1*3 volume, executed 'gluster-block   create   <volname>/<blockname>   ha  3   <IP1>,<IP2>,<IP1>   <size>'. Please note that 'IP1' was given twice.

The first time, VM core was generated [Bug 1473162]. Other attempts to reproduce it (and tried it thrice, on different nodes) resulted in VM hang. I am assuming the code checks only for the 'number' of addresses given, and has no check for the validity/duplicity of the addresses. I still don't understand the reason of hang though. But it would be good to check the sanity of IPs at the CLI level itself.

I am guessing that this might not actually be hit in CNS environment- if heketi does the check, or has NO way of going wrong with the IPs. Will need to confirm on this.


Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.8.4-33 and gluster-block-0.2.1-6


How reproducible:
================
3:3


Steps to Reproduce:
===================
1. Create a 1*3 volume, on a 3node cluster setup
2. Create a block of 'ha=3' and give addresses of node1, node2 and again node1.


Actual results:
==============
Command exits, after a long time with Broken pipe. Login to the console shows no control on the cli. Force reboot the VM, and it is back up.


Expected results:
================
Block create command should check the validity of addresses, before proceeding with the actual block creation.


Additional info:
================
Did not get much information from dmesg or /var/log/messages.

[root@dhcp47-116 abrt]# gluster-block create nash/nb55 ha 3 10.70.47.116,10.70.47.117,10.70.47.116 1M
packet_write_wait: Connection to 10.70.47.116 port 22: Broken pipe
bash-4.3$ ssh root.47.116
^C
bash-4.3$ 

[root@dhcp47-115 ~]# cat /mnt/nash/block-meta/nb55
VOLUME: nash
GBID: 5115979d-9a8d-4ea8-bcae-23280e9c6e2f
SIZE: 1048576
HA: 3
ENTRYCREATE: INPROGRESS
ENTRYCREATE: SUCCESS
10.70.47.116: CONFIGINPROGRESS
10.70.47.116: CONFIGINPROGRESS
10.70.47.117: CONFIGINPROGRESS
10.70.47.116: CONFIGSUCCESS
10.70.47.117: CONFIGSUCCESS
10.70.47.116: CONFIGFAIL
10.70.47.117: CLEANUPINPROGRESS
[root@dhcp47-115 ~]#
[root@dhcp47-115 ~]# rpm -qa | grep gluster
glusterfs-cli-3.8.4-33.el7rhgs.x86_64
glusterfs-rdma-3.8.4-33.el7rhgs.x86_64
python-gluster-3.8.4-33.el7rhgs.noarch
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-33.el7rhgs.x86_64
glusterfs-fuse-3.8.4-33.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-events-3.8.4-33.el7rhgs.x86_64
gluster-block-0.2.1-6.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
samba-vfs-glusterfs-4.6.3-3.el7rhgs.x86_64
glusterfs-3.8.4-33.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-26.el7rhgs.x86_64
glusterfs-api-3.8.4-33.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-33.el7rhgs.x86_64
glusterfs-libs-3.8.4-33.el7rhgs.x86_64
glusterfs-server-3.8.4-33.el7rhgs.x86_64
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# gluster peer status
Number of Peers: 5

Hostname: dhcp47-121.lab.eng.blr.redhat.com
Uuid: 49610061-1788-4cbc-9205-0e59fe91d842
State: Peer in Cluster (Connected)
Other names:
10.70.47.121

Hostname: dhcp47-113.lab.eng.blr.redhat.com
Uuid: a0557927-4e5e-4ff7-8dce-94873f867707
State: Peer in Cluster (Connected)

Hostname: dhcp47-114.lab.eng.blr.redhat.com
Uuid: c0dac197-5a4d-4db7-b709-dbf8b8eb0896
State: Peer in Cluster (Connected)
Other names:
10.70.47.114

Hostname: dhcp47-116.lab.eng.blr.redhat.com
Uuid: a96e0244-b5ce-4518-895c-8eb453c71ded
State: Peer in Cluster (Disconnected)
Other names:
10.70.47.116

Hostname: dhcp47-117.lab.eng.blr.redhat.com
Uuid: 17eb3cef-17e7-4249-954b-fc19ec608304
State: Peer in Cluster (Connected)
Other names:
10.70.47.117
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# gluster v info nash
 
Volume Name: nash
Type: Replicate
Volume ID: f1ea3d3e-c536-4f36-b61f-cb9761b8a0a6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.115:/bricks/brick4/nash0
Brick2: 10.70.47.116:/bricks/brick4/nash1
Brick3: 10.70.47.117:/bricks/brick4/nash2
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.open-behind: off
performance.readdir-ahead: off
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
server.allow-insecure: on
cluster.brick-multiplex: disable
cluster.enable-shared-storage: enable
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]#

Comment 3 Sweta Anandpara 2017-07-20 08:37:10 UTC
Not proposing this as a blocker for rhgs3.3 as I have got an input from CNS QE that heketi will not execute 'gluster-block create' command with duplicate IPs.

Comment 10 Amar Tumballi 2018-11-19 08:20:21 UTC
Moving it to 'UPSTREAM' as this is not the usecase, and the tools creating block files are not calling the CLI like this.