861423 – Glusterfs mount on distributed striped replicated volume fails with E [stripe-helpers.c:268:stripe_ctx_handle] <volname> Failed to get stripe-size

Bug 861423 - Glusterfs mount on distributed striped replicated volume fails with E [stripe-helpers.c:268:stripe_ctx_handle] <volname> Failed to get stripe-size

Summary: Glusterfs mount on distributed striped replicated volume fails with E [stripe...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Pranith Kumar K
QA Contact:	Ben Turner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-28 15:04 UTC by Ben Turner
Modified:	2016-09-19 22:06 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
rpc.sh (9.86 KB, application/x-shellscript) 2012-09-28 15:04 UTC, Ben Turner	no flags	Details
View All

Description Ben Turner 2012-09-28 15:04:17 UTC

Created attachment 618630 [details]
rpc.sh

Description of problem:

During RPC test on the glusterfs mounted volume:

Volume Name: distributed-striped-replicated-volume
Type: Distributed-Striped-Replicate
Volume ID: d38dd365-a6ad-47bd-b7d5-3b64a61f3c1a
Status: Started
Number of Bricks: 2 x 2 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: rhsauto001.lab.eng.blr.redhat.com:/brick1
Brick2: rhsauto002.lab.eng.blr.redhat.com:/brick1
Brick3: rhsauto003.lab.eng.blr.redhat.com:/brick1
Brick4: rhsauto004.lab.eng.blr.redhat.com:/brick1
Brick5: rhsauto005.lab.eng.blr.redhat.com:/brick1
Brick6: rhsauto006.lab.eng.blr.redhat.com:/brick1
Brick7: rhsauto007.lab.eng.blr.redhat.com:/brick1
Brick8: rhsauto008.lab.eng.blr.redhat.com:/brick1

We see the error:

[2012-09-28 10:27:33.200308] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size

When running the RPC automated sanity test(attached to the BZ).  Upon inspection after the test had failed the filesystem was unmounted.  After a remount I rerun the test, which runs successfully, but I still see the error message in the logs:

[2012-09-28 20:23:31.731599] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-0: Failed to get stripe-size

Version-Release number of selected component (if applicable):

kernel-2.6.32-279.el6.x86_64
glusterfs-3.3.0rhs-28.el6rhs.x86_64
glusterfs-server-3.3.0rhs-28.el6rhs.x86_64

How reproducible:

After I hit this I was able to reproduce the error but it appears to either be an erroneous error message or is silently failing.  I will attempt to tear down and reproduce. 

Steps to Reproduce:
1.  Create a 2 x 2 x 2 = 8 volume.
2.  Mount it -t glusterfs on a client
3.  Run the script attached to this bugzilla
  
Actual results:

[2012-09-28 10:27:33.200308] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size

Expected results:

Normal operation.
Additional info:

Comment 1 Ben Turner 2012-09-28 15:08:17 UTC

The full log from the first event:


[2012-09-28 09:59:45.341277] I [afr-common.c:1965:afr_set_root_inode_on_first_lookup] 0-distributed-striped-replicated
-volume-replicate-3: added root inode
[2012-09-28 10:27:33.200159] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr
_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-1.stripe-size
[2012-09-28 10:27:33.200308] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size
[2012-09-28 11:13:54.038196] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-0.stripe-size
[2012-09-28 11:13:54.039087] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-0: Failed to get stripe-size
[2012-09-28 11:14:00.814430] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-1.stripe-size
[2012-09-28 11:14:00.814474] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size
[2012-09-28 11:14:28.559001] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory
[2012-09-28 11:14:28.559054] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory
[2012-09-28 11:14:29.434342] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-distributed-striped-replicated-volume-replicate-1: /run23315/linux-2.6.31.1/include: Performing conservative merge
[2012-09-28 11:17:05.064991] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory
[2012-09-28 11:17:05.065107] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory
[2012-09-28 11:17:59.081884] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory
[2012-09-28 11:18:29.460894] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-0.stripe-size
[2012-09-28 11:18:29.461060] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-0: Failed to get stripe-size
[2012-09-28 11:18:30.058970] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-0.stripe-size
[2012-09-28 11:18:30.059026] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-0: Failed to get stripe-size

The log when it is silently failing after remount:

[2012-09-28 20:13:17.749555] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f6461a989e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f646187a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f6461881070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-1.stripe-size
[2012-09-28 20:13:17.749663] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size
[2012-09-28 20:13:21.984746] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory
[2012-09-28 20:13:21.984871] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory
[2012-09-28 20:13:21.987613] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory
[2012-09-28 20:13:21.987670] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory
[2012-09-28 20:13:21.995399] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory
[2012-09-28 20:13:21.995474] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory
[2012-09-28 20:13:22.000752] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-6: remote operation failed: No such file or directory
[2012-09-28 20:13:22.000809] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-7: remote operation failed: No such file or directory

Comment 3 Mark 2012-10-29 17:12:35 UTC

I am able to reproduce with a six node cluster, stripe 3 replicate 2. 




Volume Name: gv0
Type: Striped-Replicate
Volume ID: 99d0dd58-2976-4b0a-b831-421a77fd4e76
Status: Started
Number of Bricks: 1 x 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: hc1:/export/brick0
Brick2: hc2:/export/brick0
Brick3: hc3:/export/brick0
Brick4: hc4:/export/brick0
Brick5: hc5:/export/brick0
Brick6: hc6:/export/brick0


running 3.3.1

glusterfs-geo-replication-3.3.1-1.el6.x86_64
glusterfs-server-3.3.1-1.el6.x86_64
glusterfs-fuse-3.3.1-1.el6.x86_64
glusterfs-3.3.1-1.el6.x86_64



error snippet in log:
[2012-10-29 10:53:45.765901] E [stripe-helpers.c:268:stripe_ctx_handle] 0-gv0-stripe-0: Failed to get stripe-size
[2012-10-29 10:53:45.766091] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_create_unwind+0x13c) [0x7fb99dfccbbc] (-->/usr/lib64/glusterfs/3.3.1/xlator/cluster/stripe.so(stripe_create_cbk+0x60b) [0x7fb99ddb077b] (-->/usr/lib64/glusterfs/3.3.1/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7fb99ddb5070]))) 0-dict: !this || key=trusted.gv0-stripe-0.stripe-size
[2012-10-29 10:53:45.766121] E [stripe-helpers.c:268:stripe_ctx_handle] 0-gv0-stripe-0: Failed to get stripe-size
[2012-10-29 10:53:48.651417] W [fuse-bridge.c:2025:fuse_writev_cbk] 0-glusterfs-fuse: 221: WRITE => -1 (Invalid argument)

Comment 5 wesley 2012-12-12 19:13:25 UTC

Running GlusterFS 3.3.1

Similar error (previous work-around "Bug 842752" to modify afr-dir-write.c file does not resolve issue!)

log snippet:
+------------------------------------------------------------------------------+
[2012-12-12 14:11:25.900073] E [stripe-helpers.c:268:stripe_ctx_handle] 0-d2s2r2-stripe-0: Failed to get stripe-size
[2012-12-12 14:11:25.918661] E [stripe.c:3051:stripe_ftruncate] 0-d2s2r2-stripe-0: no stripe count
[2012-12-12 14:11:25.918744] W [fuse-bridge.c:459:fuse_truncate_cbk] 0-glusterfs-fuse: 46: FTRUNCATE() ERR => -1 (Invalid argument)

Comment 6 wesley 2012-12-12 19:38:24 UTC

Sorry my config below

Volume Name: d2s2r2
Type: Distributed-Striped-Replicate
Volume ID: e2c44bac-65c1-429b-8569-982e661f3019
Status: Started
Number of Bricks: 2 x 2 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.191.10.111:/exp/brick1/d2s2r2
Brick2: 10.191.10.112:/exp/brick1/d2s2r2
Brick3: 10.191.10.113:/exp/brick1/d2s2r2
Brick4: 10.191.10.114:/exp/brick1/d2s2r2
Brick5: 10.191.10.115:/exp/brick1/d2s2r2
Brick6: 10.191.10.116:/exp/brick1/d2s2r2
Brick7: 10.191.10.117:/exp/brick1/d2s2r2
Brick8: 10.191.10.118:/exp/brick1/d2s2r2
Options Reconfigured:
cluster.data-self-heal: off
cluster.stripe-block-size: 131072

Comment 7 wesley 2012-12-13 17:38:28 UTC

While the rpc.sh script still fails:
[root@devnet2 BMs] $ ./rpc-coverage.sh /mnt/gluster/d2s2r2/
removed `/mnt/gluster/d2s2r2//coverage/dir/file'
removed directory: `/mnt/gluster/d2s2r2//coverage/dir'
removed directory: `/mnt/gluster/d2s2r2//coverage'
open: failed.

Something interesting, yet helpful for those debugging the code:

My gluster striped-replication volume is mounted on /mnt/gluster/d2s2r2/

If I perform the following commands, here is my log output:

#touch ls -l /mnt/gluster/d2s2r2/test9 (dd the same)
-rw-r--r-- 1 root root 0 Dec 13 12:31 /mnt/gluster/d2s2r2/test

log:
[2012-12-13 12:31:30.326183] E [stripe-helpers.c:268:stripe_ctx_handle] 0-d2s2r2-stripe-0: Failed to get stripe-size

#ls -l /mnt/gluster/d2s2r2/test
-rw-r--r-- 1 root root 0 Dec 13 12:31 /mnt/gluster/d2s2r2/test

If I then Write some bytes of zero
$ dd if=/dev/zero of=/mnt/gluster/d2s2r2/test bs=2048 count=1000
1000+0 records in
1000+0 records out
2048000 bytes (2.0 MB) copied, 0.37201 s, 5.5 MB/s
ls -l /mnt/gluster/d2s2r2/test
-rw-r--r-- 1 root root 2048000 Dec 13 12:32 /mnt/gluster/d2s2r2/test

log:
Nothing logged (file was written to)

I can read from the file as well
dd if=/mnt/gluster/d2s2r2/test of=/dev/null bs=2048
1000+0 records in
1000+0 records out
2048000 bytes (2.0 MB) copied, 0.102214 s, 20.0 MB/s

log:
Nothing logged (file was read from)

Why doesn't gluster pick up stripe-size on the First Write?  Now I have to "touch" files to work around this bug.

Comment 8 vpshastry 2013-01-22 13:00:18 UTC

Because of posix_mknod is not sending the xattr filled while unwinding, it couldn't get the stripe-size from the xattr. So, I think, mostly the patch http://review.gluster.org/3904 would fix the issue.

Comment 9 Mark 2013-02-21 15:17:43 UTC

has anyone reproduced the issue under the 3.4 alpha ?

Comment 10 Mukil 2013-03-20 15:41:50 UTC

I'm hitting the same issue with 3.4.0alpha. Here is my volume config:

Volume Name: vmstore
Type: Stripe
Volume ID: 3a2a4208-9e62-4e3e-a9e1-bf68cd694ca2
Status: Started
Number of Bricks: 1 x 12 = 12
Transport-type: tcp
Bricks:
Brick1: 10.1.17.43:/export/store
Brick2: 10.1.17.44:/export/store
Brick3: 10.1.17.45:/export/store
Brick4: 10.1.17.46:/export/store
Brick5: 10.1.17.47:/export/store
Brick6: 10.1.17.48:/export/store
Brick7: 10.1.17.49:/export/store
Brick8: 10.1.17.50:/export/store
Brick9: 10.1.17.51:/export/store
Brick10: 10.1.17.53:/export/store
Brick11: 10.1.17.54:/export/store
Brick12: 10.1.17.55:/export/store


Error snippet from client log:


[2013-03-20 15:29:24.235601] E [stripe-helpers.c:355:stripe_ctx_handle] 0-vmstore-stripe-0: Failed to get stripe-size
[2013-03-20 15:29:24.255642] I [dict.c:370:dict_get] (-->/usr/local/lib/glusterfs/3.4.0alpha/xlator/protocol/client.so(client3_3_mknod_cbk+0x825) [0x7fc8f2db3255] (-->/usr/local/lib/glusterfs/3.4.\
0alpha/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x120) [0x7fc8f2b7d6b0] (-->/usr/local/lib/glusterfs/3.4.0alpha/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7fc8f2b84d90]))) 0-dict: \
!this || key=trusted.vmstore-stripe-0.stripe-size

Interestingly enough, though, this doesn't seem to make the fuse mount point unavailable  (unlike versions < 3.4) - just slower than normal.

Comment 11 Mukil 2013-03-20 15:46:33 UTC

Spoke too soon. It does make the mount point unavailable just like with the earlier version.

Comment 12 Ben Turner 2013-09-20 21:00:33 UTC

I am still seeing this on 2.1 bits.

Comment 13 Vivek Agarwal 2015-03-23 07:40:32 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Comment 14 Vivek Agarwal 2015-03-23 07:40:49 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Note You need to log in before you can comment on or make changes to this bug.