Bug 861423
Summary: | Glusterfs mount on distributed striped replicated volume fails with E [stripe-helpers.c:268:stripe_ctx_handle] <volname> Failed to get stripe-size | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ben Turner <bturner> | ||||
Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Ben Turner <bturner> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2.0 | CC: | ar1666, bug, fabella.wesley, mark.a.sloan, mukilk, rhs-bugs, sdharane, servicedesk, shmohan, storage-qa-internal, vbellur | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | Type: | Bug | |||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
The full log from the first event: [2012-09-28 09:59:45.341277] I [afr-common.c:1965:afr_set_root_inode_on_first_lookup] 0-distributed-striped-replicated -volume-replicate-3: added root inode [2012-09-28 10:27:33.200159] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr _mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-1.stripe-size [2012-09-28 10:27:33.200308] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size [2012-09-28 11:13:54.038196] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-0.stripe-size [2012-09-28 11:13:54.039087] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-0: Failed to get stripe-size [2012-09-28 11:14:00.814430] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-1.stripe-size [2012-09-28 11:14:00.814474] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size [2012-09-28 11:14:28.559001] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory [2012-09-28 11:14:28.559054] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory [2012-09-28 11:14:29.434342] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-distributed-striped-replicated-volume-replicate-1: /run23315/linux-2.6.31.1/include: Performing conservative merge [2012-09-28 11:17:05.064991] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory [2012-09-28 11:17:05.065107] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory [2012-09-28 11:17:59.081884] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory [2012-09-28 11:18:29.460894] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-0.stripe-size [2012-09-28 11:18:29.461060] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-0: Failed to get stripe-size [2012-09-28 11:18:30.058970] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f444d5589e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f444d33a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f444d341070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-0.stripe-size [2012-09-28 11:18:30.059026] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-0: Failed to get stripe-size The log when it is silently failing after remount: [2012-09-28 20:13:17.749555] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/replicate.so(afr_mknod_unwind+0x131) [0x7f6461a989e1] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x13d) [0x7f646187a81d] (-->/usr/lib64/glusterfs/3.3.0rhs/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7f6461881070]))) 0-dict: !this || key=trusted.distributed-striped-replicated-volume-stripe-1.stripe-size [2012-09-28 20:13:17.749663] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size [2012-09-28 20:13:21.984746] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory [2012-09-28 20:13:21.984871] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory [2012-09-28 20:13:21.987613] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory [2012-09-28 20:13:21.987670] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory [2012-09-28 20:13:21.995399] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-2: remote operation failed: No such file or directory [2012-09-28 20:13:21.995474] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-3: remote operation failed: No such file or directory [2012-09-28 20:13:22.000752] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-6: remote operation failed: No such file or directory [2012-09-28 20:13:22.000809] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-distributed-striped-replicated-volume-client-7: remote operation failed: No such file or directory I am able to reproduce with a six node cluster, stripe 3 replicate 2. Volume Name: gv0 Type: Striped-Replicate Volume ID: 99d0dd58-2976-4b0a-b831-421a77fd4e76 Status: Started Number of Bricks: 1 x 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: hc1:/export/brick0 Brick2: hc2:/export/brick0 Brick3: hc3:/export/brick0 Brick4: hc4:/export/brick0 Brick5: hc5:/export/brick0 Brick6: hc6:/export/brick0 running 3.3.1 glusterfs-geo-replication-3.3.1-1.el6.x86_64 glusterfs-server-3.3.1-1.el6.x86_64 glusterfs-fuse-3.3.1-1.el6.x86_64 glusterfs-3.3.1-1.el6.x86_64 error snippet in log: [2012-10-29 10:53:45.765901] E [stripe-helpers.c:268:stripe_ctx_handle] 0-gv0-stripe-0: Failed to get stripe-size [2012-10-29 10:53:45.766091] I [dict.c:317:dict_get] (-->/usr/lib64/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_create_unwind+0x13c) [0x7fb99dfccbbc] (-->/usr/lib64/glusterfs/3.3.1/xlator/cluster/stripe.so(stripe_create_cbk+0x60b) [0x7fb99ddb077b] (-->/usr/lib64/glusterfs/3.3.1/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7fb99ddb5070]))) 0-dict: !this || key=trusted.gv0-stripe-0.stripe-size [2012-10-29 10:53:45.766121] E [stripe-helpers.c:268:stripe_ctx_handle] 0-gv0-stripe-0: Failed to get stripe-size [2012-10-29 10:53:48.651417] W [fuse-bridge.c:2025:fuse_writev_cbk] 0-glusterfs-fuse: 221: WRITE => -1 (Invalid argument) Running GlusterFS 3.3.1 Similar error (previous work-around "Bug 842752" to modify afr-dir-write.c file does not resolve issue!) log snippet: +------------------------------------------------------------------------------+ [2012-12-12 14:11:25.900073] E [stripe-helpers.c:268:stripe_ctx_handle] 0-d2s2r2-stripe-0: Failed to get stripe-size [2012-12-12 14:11:25.918661] E [stripe.c:3051:stripe_ftruncate] 0-d2s2r2-stripe-0: no stripe count [2012-12-12 14:11:25.918744] W [fuse-bridge.c:459:fuse_truncate_cbk] 0-glusterfs-fuse: 46: FTRUNCATE() ERR => -1 (Invalid argument) Sorry my config below Volume Name: d2s2r2 Type: Distributed-Striped-Replicate Volume ID: e2c44bac-65c1-429b-8569-982e661f3019 Status: Started Number of Bricks: 2 x 2 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.191.10.111:/exp/brick1/d2s2r2 Brick2: 10.191.10.112:/exp/brick1/d2s2r2 Brick3: 10.191.10.113:/exp/brick1/d2s2r2 Brick4: 10.191.10.114:/exp/brick1/d2s2r2 Brick5: 10.191.10.115:/exp/brick1/d2s2r2 Brick6: 10.191.10.116:/exp/brick1/d2s2r2 Brick7: 10.191.10.117:/exp/brick1/d2s2r2 Brick8: 10.191.10.118:/exp/brick1/d2s2r2 Options Reconfigured: cluster.data-self-heal: off cluster.stripe-block-size: 131072 While the rpc.sh script still fails: [root@devnet2 BMs] $ ./rpc-coverage.sh /mnt/gluster/d2s2r2/ removed `/mnt/gluster/d2s2r2//coverage/dir/file' removed directory: `/mnt/gluster/d2s2r2//coverage/dir' removed directory: `/mnt/gluster/d2s2r2//coverage' open: failed. Something interesting, yet helpful for those debugging the code: My gluster striped-replication volume is mounted on /mnt/gluster/d2s2r2/ If I perform the following commands, here is my log output: #touch ls -l /mnt/gluster/d2s2r2/test9 (dd the same) -rw-r--r-- 1 root root 0 Dec 13 12:31 /mnt/gluster/d2s2r2/test log: [2012-12-13 12:31:30.326183] E [stripe-helpers.c:268:stripe_ctx_handle] 0-d2s2r2-stripe-0: Failed to get stripe-size #ls -l /mnt/gluster/d2s2r2/test -rw-r--r-- 1 root root 0 Dec 13 12:31 /mnt/gluster/d2s2r2/test If I then Write some bytes of zero $ dd if=/dev/zero of=/mnt/gluster/d2s2r2/test bs=2048 count=1000 1000+0 records in 1000+0 records out 2048000 bytes (2.0 MB) copied, 0.37201 s, 5.5 MB/s ls -l /mnt/gluster/d2s2r2/test -rw-r--r-- 1 root root 2048000 Dec 13 12:32 /mnt/gluster/d2s2r2/test log: Nothing logged (file was written to) I can read from the file as well dd if=/mnt/gluster/d2s2r2/test of=/dev/null bs=2048 1000+0 records in 1000+0 records out 2048000 bytes (2.0 MB) copied, 0.102214 s, 20.0 MB/s log: Nothing logged (file was read from) Why doesn't gluster pick up stripe-size on the First Write? Now I have to "touch" files to work around this bug. Because of posix_mknod is not sending the xattr filled while unwinding, it couldn't get the stripe-size from the xattr. So, I think, mostly the patch http://review.gluster.org/3904 would fix the issue. has anyone reproduced the issue under the 3.4 alpha ? I'm hitting the same issue with 3.4.0alpha. Here is my volume config: Volume Name: vmstore Type: Stripe Volume ID: 3a2a4208-9e62-4e3e-a9e1-bf68cd694ca2 Status: Started Number of Bricks: 1 x 12 = 12 Transport-type: tcp Bricks: Brick1: 10.1.17.43:/export/store Brick2: 10.1.17.44:/export/store Brick3: 10.1.17.45:/export/store Brick4: 10.1.17.46:/export/store Brick5: 10.1.17.47:/export/store Brick6: 10.1.17.48:/export/store Brick7: 10.1.17.49:/export/store Brick8: 10.1.17.50:/export/store Brick9: 10.1.17.51:/export/store Brick10: 10.1.17.53:/export/store Brick11: 10.1.17.54:/export/store Brick12: 10.1.17.55:/export/store Error snippet from client log: [2013-03-20 15:29:24.235601] E [stripe-helpers.c:355:stripe_ctx_handle] 0-vmstore-stripe-0: Failed to get stripe-size [2013-03-20 15:29:24.255642] I [dict.c:370:dict_get] (-->/usr/local/lib/glusterfs/3.4.0alpha/xlator/protocol/client.so(client3_3_mknod_cbk+0x825) [0x7fc8f2db3255] (-->/usr/local/lib/glusterfs/3.4.\ 0alpha/xlator/cluster/stripe.so(stripe_mknod_ifreg_cbk+0x120) [0x7fc8f2b7d6b0] (-->/usr/local/lib/glusterfs/3.4.0alpha/xlator/cluster/stripe.so(stripe_ctx_handle+0x90) [0x7fc8f2b84d90]))) 0-dict: \ !this || key=trusted.vmstore-stripe-0.stripe-size Interestingly enough, though, this doesn't seem to make the fuse mount point unavailable (unlike versions < 3.4) - just slower than normal. Spoke too soon. It does make the mount point unavailable just like with the earlier version. I am still seeing this on 2.1 bits. The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version. [1] https://rhn.redhat.com/errata/RHSA-2014-0821.html The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version. [1] https://rhn.redhat.com/errata/RHSA-2014-0821.html |
Created attachment 618630 [details] rpc.sh Description of problem: During RPC test on the glusterfs mounted volume: Volume Name: distributed-striped-replicated-volume Type: Distributed-Striped-Replicate Volume ID: d38dd365-a6ad-47bd-b7d5-3b64a61f3c1a Status: Started Number of Bricks: 2 x 2 x 2 = 8 Transport-type: tcp Bricks: Brick1: rhsauto001.lab.eng.blr.redhat.com:/brick1 Brick2: rhsauto002.lab.eng.blr.redhat.com:/brick1 Brick3: rhsauto003.lab.eng.blr.redhat.com:/brick1 Brick4: rhsauto004.lab.eng.blr.redhat.com:/brick1 Brick5: rhsauto005.lab.eng.blr.redhat.com:/brick1 Brick6: rhsauto006.lab.eng.blr.redhat.com:/brick1 Brick7: rhsauto007.lab.eng.blr.redhat.com:/brick1 Brick8: rhsauto008.lab.eng.blr.redhat.com:/brick1 We see the error: [2012-09-28 10:27:33.200308] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size When running the RPC automated sanity test(attached to the BZ). Upon inspection after the test had failed the filesystem was unmounted. After a remount I rerun the test, which runs successfully, but I still see the error message in the logs: [2012-09-28 20:23:31.731599] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-0: Failed to get stripe-size Version-Release number of selected component (if applicable): kernel-2.6.32-279.el6.x86_64 glusterfs-3.3.0rhs-28.el6rhs.x86_64 glusterfs-server-3.3.0rhs-28.el6rhs.x86_64 How reproducible: After I hit this I was able to reproduce the error but it appears to either be an erroneous error message or is silently failing. I will attempt to tear down and reproduce. Steps to Reproduce: 1. Create a 2 x 2 x 2 = 8 volume. 2. Mount it -t glusterfs on a client 3. Run the script attached to this bugzilla Actual results: [2012-09-28 10:27:33.200308] E [stripe-helpers.c:268:stripe_ctx_handle] 0-distributed-striped-replicated-volume-stripe-1: Failed to get stripe-size Expected results: Normal operation. Additional info: