Bug 1493037

Summary: [Gluster-block]: Block creation failed in remote create after n successful blocks
Product: Red Hat Enterprise Linux 7 Reporter: Sweta Anandpara <sanandpa>
Component: python-rtslibAssignee: Maurizio Lombardi <mlombard>
Status: CLOSED ERRATA QA Contact: Martin Hoyer <mhoyer>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: agrover, amukherj, coughlan, lmiksik, mchristi, mhoyer, mlombard, pprakash, prasanna.kalever, rhandlin, rhs-bugs, salmy, storage-qa-internal
Target Milestone: pre-dev-freeze   
Target Release: 7.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1544435 (view as bug list) Environment:
Last Closed: 2018-04-10 19:06:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1544435    

Description Sweta Anandpara 2017-09-19 08:50:30 UTC
Description of problem:
======================
Hit this while trying to validate BZ 1482057.
Had a 2*3 volume, brick-mux enabled, all the required volume options set, and executed the below command:
for i in {0..400}; do echo "=== $i ==="; time gluster-block create ozone/oz$i ha 3 <ip1>,<ip2>,<ip3> 10Kib; done

=== 171 ===
IQN: iqn.2016-12.org.gluster-block:e5da5fab-a1bc-418a-9b7c-59c3a04d5b3c
PORTAL(S):  10.70.37.78:3260 10.70.37.86:3260 10.70.37.94:3260
RESULT: SUCCESS

real    1m57.952s
user    0m0.004s
sys    0m0.018s
=== 172 ===
Did not receive any response from gluster-block daemon. Please check log
files to find the reason

real    5m5.643s
user    0m0.005s
sys    0m0.032s

The create took place successfully until 170 odd blocks and then failed. Gluster-blockd logs show remote create failed on one of the peer nodes.

Not copying any logs/sosreports (now) as the entire setup has been left in the same state, and is presently being looked into by dev team.



Version-Release number of selected component (if applicable):
===========================================================
gluster-block-0.2.1-12
tcmu-runner-1.2.0-15


How reproducible:
=================
1:1
Did hit similar issue while verifying BZ 1482057, but cannot say that for sure..


Steps to Reproduce:
==================
Create ~400 blocks of ha 3 in a loop from one of the nodes


Additional info:
==================
[root@dhcp37-86 ~]# rpm -qa | grep gluster
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
gluster-block-0.2.1-12.el7rhgs.x86_64
python-gluster-3.8.4-44.el7rhgs.noarch
glusterfs-libs-3.8.4-44.el7rhgs.x86_64
glusterfs-3.8.4-44.el7rhgs.x86_64
glusterfs-server-3.8.4-44.el7rhgs.x86_64
glusterfs-events-3.8.4-44.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-44.el7rhgs.x86_64
glusterfs-api-3.8.4-44.el7rhgs.x86_64
glusterfs-rdma-3.8.4-44.el7rhgs.x86_64
glusterfs-cli-3.8.4-44.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-fuse-3.8.4-44.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64
glusterfs-geo-replication-3.8.4-44.el7rhgs.x86_64
[root@dhcp37-86 ~]# rpm -qa | grep tcmu-runner
tcmu-runner-1.2.0-15.el7rhgs.x86_64

[root@dhcp37-86 ~]# 
[root@dhcp37-86 ~]# gluster v status
^C
[root@dhcp37-86 ~]# gluster v info
 
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: f8873304-b5d3-4460-878d-84ce90b835c1
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.37.78:/bricks/brick0/ozone0
Brick2: 10.70.37.86:/bricks/brick0/ozone1
Brick3: 10.70.37.94:/bricks/brick0/ozone2
Brick4: 10.70.37.78:/bricks/brick1/ozone3
Brick5: 10.70.37.86:/bricks/brick1/ozone4
Brick6: 10.70.37.94:/bricks/brick1/ozone5
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.open-behind: off
performance.readdir-ahead: off
performance.strict-o-direct: on
network.remote-dio: disable
cluster.eager-lock: disable
cluster.quorum-type: auto
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
features.shard-block-size: 64MB
user.cifs: off
server.allow-insecure: on
cluster.brick-multiplex: enable
[root@dhcp37-86 ~]#

Comment 22 Martin Hoyer 2018-02-28 13:15:52 UTC
Regression tests with python-rtslib-2.1.fb63-5.el7 passed.
Moving to VERIFIED as per comment#21

Comment 25 errata-xmlrpc 2018-04-10 19:06:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1036