Bug 1454313 - gluster-block is not working as expected when shard is enabled
Summary: gluster-block is not working as expected when shard is enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: sharding
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.3.0
Assignee: Pranith Kumar K
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks: 1417151 1455301 1456225
TreeView+ depends on / blocked
 
Reported: 2017-05-22 12:46 UTC by Pranith Kumar K
Modified: 2017-09-21 04:58 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.4-27
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1455301 (view as bug list)
Environment:
Last Closed: 2017-09-21 04:45:37 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Pranith Kumar K 2017-05-22 12:46:28 UTC
Description of problem:
Because gluster-block is storing metadata on the same volume as that of data and since metadata updates are multi-client writes, gluster-block create hangs and goes into a loop before it dies.
Reason is that the actual file size and filesize on the mount are differing and gluster-block is not able to understand if the operation succeeded or not.

[root@localhost block-meta]# ls -l /brick1/
block-meta/  block-store/ .glusterfs/  .shard/      .trashcan/   
[root@localhost block-meta]# ls -l /brick1/block-meta/1
-rw-------. 2 root root 52304 May 20 19:36 /brick1/block-meta/1 <<<---- true size.
[root@localhost block-meta]# ls -l 1
-rw-------. 1 root root 101 May 20 19:36 1 <<----- has truncated size.

Either metadata needs to be moved to separate volume or shard shouldn't be enabled on the volume for gluster-block.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Pranith Kumar K 2017-05-24 17:24:19 UTC
Found the Root cause:

When a file is opened with O_APPEND, offset gets ignored and the write buffer is always appended to the file. Where as shard doesn't ignore offset when the fd has O_APPEND. This is leading the size to be always stuck at 101 bytes because that is the biggest write that comes on the file:

Thread 2 "gluster-blockd" hit Breakpoint 1, shard_writev (frame=0x61200005391c, 
    this=0x61f00001a4c0, fd=0x61100000b21c, vector=0x60800000cee0, count=1, offset=0, 
    flags=0, iobref=0x60d00001d7c0, xdata=0x0) at shard.c:4827
4827	        shard_common_inode_write_begin (frame, this, GF_FOP_WRITE, fd, vector,
Missing separate debuginfos, use: dnf debuginfo-install json-c-0.12-7.fc24.x86_64 libacl-2.2.52-11.fc24.x86_64 libattr-2.4.47-16.fc24.x86_64 libstdc++-6.2.1-2.fc25.x86_64 sssd-client-1.14.2-1.fc25.x86_64
(gdb) dis 1
(gdb) c
Continuing.
[Switching to Thread 0x7fffe565a700 (LWP 9037)]

Thread 10 "gluster-blockd" hit Breakpoint 2, trace_writev_cbk (frame=0x612000053c1c, 
    cookie=0x61200005391c, this=0x61f0000196c0, op_ret=101, op_errno=0, 
    prebuf=0x61b00001a68c, postbuf=0x61b00001a6fc, xdata=0x611000052d9c) at trace.c:232
232	        char         preopstr[4096]  = {0, };
(gdb) p postbuf.ia_size
$1 = 101
(gdb) en 1
(gdb) c
Continuing.

Thread 10 "gluster-blockd" hit Breakpoint 1, shard_writev (frame=0x61200002841c, 
    this=0x61f00001a4c0, fd=0x61100003cf9c, vector=0x608000020be0, count=1, offset=0, 
    flags=0, iobref=0x60d00003d530, xdata=0x0) at shard.c:4827
4827	        shard_common_inode_write_begin (frame, this, GF_FOP_WRITE, fd, vector,
(gdb) c
Continuing.
[Switching to Thread 0x7fffe0f08700 (LWP 9038)]

Thread 11 "gluster-blockd" hit Breakpoint 2, trace_writev_cbk (frame=0x61200002871c, 
    cookie=0x61200002841c, this=0x61f0000196c0, op_ret=21, op_errno=0, prebuf=0x61b00000cd8c, 
    postbuf=0x61b00000cdfc, xdata=0x611000064bdc) at trace.c:232
232	        char         preopstr[4096]  = {0, };
(gdb) p postbuf.ia_size
$2 = 101
(gdb) c
Continuing.
[New Thread 0x7fffe04e8700 (LWP 9040)]
[New Thread 0x7fffdfcc4700 (LWP 9041)]
[New Thread 0x7fffdf490700 (LWP 9042)]

Thread 11 "gluster-blockd" hit Breakpoint 1, shard_writev (frame=0x61200003dd1c, 
    this=0x61f00001a4c0, fd=0x61100009479c, vector=0x608000032a60, count=1, offset=0, 
    flags=0, iobref=0x60d00006c800, xdata=0x0) at shard.c:4827
4827	        shard_common_inode_write_begin (frame, this, GF_FOP_WRITE, fd, vector,
(gdb) c
Continuing.
[Switching to Thread 0x7fffe565a700 (LWP 9037)]

Thread 10 "gluster-blockd" hit Breakpoint 2, trace_writev_cbk (frame=0x61200003e01c, 
    cookie=0x61200003dd1c, this=0x61f0000196c0, op_ret=33, op_errno=0, prebuf=0x61b00002b78c, 
    postbuf=0x61b00002b7fc, xdata=0x61100007e5dc) at trace.c:232
232	        char         preopstr[4096]  = {0, };
(gdb) p postbuf.ia_size
$3 = 101
(gdb) q
A debugging session is active.

	Inferior 1 [process 9024] will be killed.

After fixing the issue with:

[root@localhost r3]# gluster-block create r3/12 ha 3 192.168.122.61,192.168.122.123,192.168.122.113 1GiB
IQN: iqn.2016-12.org.gluster-block:1aef8052-2547-482e-9316-e41ba0e4b289
PORTAL(S):  192.168.122.61:3260 192.168.122.123:3260 192.168.122.113:3260
RESULT: SUCCESS
[root@localhost r3]# ls -l /brick1/block-meta/12
-rw-------. 2 root root 315 May 24 22:52 /brick1/block-meta/12
[root@localhost r3]# ls -l /mnt/block-meta/12
-rw-------. 1 root root 315 May 24 22:52 /mnt/block-meta/12

Comment 3 Pranith Kumar K 2017-05-24 17:33:20 UTC
https://review.gluster.org/17387

Comment 9 surabhi 2017-06-20 12:22:47 UTC
Gluster-block create and delete works fine.In this particular case the create was failing due to invalid host. Now that issue is fixed and we are not having volume create failed. Also verified that delete is happening successfully.

With multiple deletes we have vmcore issue but that is tracked in a separate bug. So marking this bug as Verified.

Comment 10 surabhi 2017-06-20 12:23:22 UTC
Apologies for the wrong bug update.

Comment 11 Sweta Anandpara 2017-07-07 09:24:59 UTC
Tested and verified this on the build glusterfs-3.8.4-31 and gluster-block-0.2.1-4. 

Gluster-block create works. No issues/hangs are seen while executing this command. The parent volume (in which blocks are created) has all the required options set using the 'gluster volume set <volname> group' command. 

Also, backend data shows the .shard folder and the required shards. Moving this bug to verified in rhgs3.3.

[root@dhcp47-115 ~]# gluster-block list nash
nb1
nb2
nb3
[root@dhcp47-115 ~]# gluster-block create nash/nb4 
Inadequate arguments for create:
gluster-block create <volname/blockname> [ha <count>] [auth enable|disable] <HOST1[,HOST2,...]> <size> [--json*]
[root@dhcp47-115 ~]# gluster-block create nash/nb4 ha 1 10.70.47.115 20M
IQN: iqn.2016-12.org.gluster-block:2cb06c34-3c9d-493d-9511-fc061385b808
PORTAL(S):  10.70.47.115:3260
RESULT: SUCCESS
[root@dhcp47-115 ~]# cd -
/bricks/brick4/nash0/.shard
[root@dhcp47-115 .shard]# ls -l | wc -l
277
[root@dhcp47-115 ~]# gluster v info nash
 
Volume Name: nash
Type: Replicate
Volume ID: f1ea3d3e-c536-4f36-b61f-cb9761b8a0a6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.115:/bricks/brick4/nash0
Brick2: 10.70.47.116:/bricks/brick4/nash1
Brick3: 10.70.47.117:/bricks/brick4/nash2
Options Reconfigured:
server.allow-insecure: on
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: enable
performance.readdir-ahead: off
performance.open-behind: off
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: disable
cluster.enable-shared-storage: enable
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# rpm -qa | grep gluster
glusterfs-cli-3.8.4-31.el7rhgs.x86_64
gluster-block-0.2.1-4.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-10.el7.x86_64
glusterfs-libs-3.8.4-31.el7rhgs.x86_64
glusterfs-events-3.8.4-31.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-api-3.8.4-31.el7rhgs.x86_64
python-gluster-3.8.4-31.el7rhgs.noarch
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
samba-vfs-glusterfs-4.6.3-3.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-31.el7rhgs.x86_64
glusterfs-server-3.8.4-31.el7rhgs.x86_64
glusterfs-rdma-3.8.4-31.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-26.el7rhgs.x86_64
glusterfs-3.8.4-31.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-31.el7rhgs.x86_64
glusterfs-fuse-3.8.4-31.el7rhgs.x86_64
[root@dhcp47-115 ~]#

Comment 13 errata-xmlrpc 2017-09-21 04:45:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 14 errata-xmlrpc 2017-09-21 04:58:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.