Bug 1493037

Summary:	[Gluster-block]: Block creation failed in remote create after n successful blocks
Product:	Red Hat Enterprise Linux 7	Reporter:	Sweta Anandpara <sanandpa>
Component:	python-rtslib	Assignee:	Maurizio Lombardi <mlombard>
Status:	CLOSED ERRATA	QA Contact:	Martin Hoyer <mhoyer>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	7.0	CC:	agrover, amukherj, coughlan, lmiksik, mchristi, mhoyer, mlombard, pprakash, prasanna.kalever, rhandlin, rhs-bugs, salmy, storage-qa-internal
Target Milestone:	pre-dev-freeze
Target Release:	7.5
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1544435 (view as bug list)		Environment:
Last Closed:	2018-04-10 19:06:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1544435

Description Sweta Anandpara 2017-09-19 08:50:30 UTC

Description of problem:
======================
Hit this while trying to validate BZ 1482057.
Had a 2*3 volume, brick-mux enabled, all the required volume options set, and executed the below command:
for i in {0..400}; do echo "=== $i ==="; time gluster-block create ozone/oz$i ha 3 <ip1>,<ip2>,<ip3> 10Kib; done

=== 171 ===
IQN: iqn.2016-12.org.gluster-block:e5da5fab-a1bc-418a-9b7c-59c3a04d5b3c
PORTAL(S):  10.70.37.78:3260 10.70.37.86:3260 10.70.37.94:3260
RESULT: SUCCESS

real    1m57.952s
user    0m0.004s
sys    0m0.018s
=== 172 ===
Did not receive any response from gluster-block daemon. Please check log
files to find the reason

real    5m5.643s
user    0m0.005s
sys    0m0.032s

The create took place successfully until 170 odd blocks and then failed. Gluster-blockd logs show remote create failed on one of the peer nodes.

Not copying any logs/sosreports (now) as the entire setup has been left in the same state, and is presently being looked into by dev team.



Version-Release number of selected component (if applicable):
===========================================================
gluster-block-0.2.1-12
tcmu-runner-1.2.0-15


How reproducible:
=================
1:1
Did hit similar issue while verifying BZ 1482057, but cannot say that for sure..


Steps to Reproduce:
==================
Create ~400 blocks of ha 3 in a loop from one of the nodes


Additional info:
==================
[root@dhcp37-86 ~]# rpm -qa | grep gluster
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
gluster-block-0.2.1-12.el7rhgs.x86_64
python-gluster-3.8.4-44.el7rhgs.noarch
glusterfs-libs-3.8.4-44.el7rhgs.x86_64
glusterfs-3.8.4-44.el7rhgs.x86_64
glusterfs-server-3.8.4-44.el7rhgs.x86_64
glusterfs-events-3.8.4-44.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64
glusterfs-api-3.8.4-44.el7rhgs.x86_64
glusterfs-rdma-3.8.4-44.el7rhgs.x86_64
glusterfs-cli-3.8.4-44.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-fuse-3.8.4-44.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64
glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64
[root@dhcp37-86 ~]# rpm -qa | grep tcmu-runner
tcmu-runner-1.2.0-15.el7rhgs.x86_64

[root@dhcp37-86 ~]# 
[root@dhcp37-86 ~]# gluster v status
^C
[root@dhcp37-86 ~]# gluster v info
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: f8873304-b5d3-4460-878d-84ce90b835c1
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.37.78:/bricks/brick0/ozone0
Brick2: 10.70.37.86:/bricks/brick0/ozone1
Brick3: 10.70.37.94:/bricks/brick0/ozone2
Brick4: 10.70.37.78:/bricks/brick1/ozone3
Brick5: 10.70.37.86:/bricks/brick1/ozone4
Brick6: 10.70.37.94:/bricks/brick1/ozone5
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.open-behind: off
performance.readdir-ahead: off
performance.strict-o-direct: on
network.remote-dio: disable
cluster.eager-lock: disable
cluster.quorum-type: auto
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
features.shard-block-size: 64MB
user.cifs: off
server.allow-insecure: on
cluster.brick-multiplex: enable
[root@dhcp37-86 ~]#

Comment 22 Martin Hoyer 2018-02-28 13:15:52 UTC

Regression tests with python-rtslib-2.1.fb63-5.el7 passed.
Moving to VERIFIED as per comment#21

Comment 25 errata-xmlrpc 2018-04-10 19:06:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1036