+++ This bug was initially created as a clone of Bug #1472758 +++ +++ This bug was initially created as a clone of Bug #1472757 +++ Description of problem: I created a VM using disks stored in gluster volume, and ran the sysbench test on it. This caused I/O errors and the disk to be mounted as read-only disk. # sysbench prepare --test=oltp --mysql-table-engine=innodb --mysql-password=pwd --oltp-table-size=500000000 --oltp-dist-type=gaussian - Errors in dmesg [ 2838.983763] blk_update_request: I/O error, dev vdb, sector 0 [ 2842.932506] blk_update_request: I/O error, dev vdb, sector 524722736 [ 2842.932577] Aborting journal on device vdb-8. [ 2842.933882] EXT4-fs error (device vdb): ext4_journal_check_start:56: Detected aborted journal [ 2842.934009] EXT4-fs (vdb): Remounting filesystem read-only On the host's gluster logs: mount log: [2017-07-19 07:42:47.219501] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-glusterlocal1-client-0: remote operation failed. Path: /.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 (00000000-0000-0000-0000-000000000000) [No data available] [2017-07-19 07:42:47.219587] E [MSGID: 133010] [shard.c:1725:shard_common_lookup_shards_cbk] 0-glusterlocal1-shard: Lookup on shard 843 failed. Base file gfid = c37d9820-0fd8-4d8e-af67-a9a54e5a99af [No data available] brick log: [2017-07-19 07:42:16.094979] E [MSGID: 113020] [posix.c:1361:posix_mknod] 0-glusterlocal1-posix: setting gfid on /rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.842 failed [2017-07-19 07:42:47.218982] E [MSGID: 113002] [posix.c:253:posix_lookup] 0-glusterlocal1-posix: buf->ia_gfid is null for /rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 [No data available] gluster vol info: Volume Name: glusterlocal1 Type: Distribute Volume ID: 3b0d4b90-10a4-4a91-80c1-27d051daf731 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: server1:/rhgs/bricks/gv1 Options Reconfigured: performance.strict-o-direct: on storage.owner-gid: 107 storage.owner-uid: 107 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Create an image on gluster volume mount point using qemu-img create qemu-img create -f qcow2 -o preallocation=off /mnt/glusterlocal1/vm1boot.img 500G 2. Start the VM with additional device as image created in Step 1 3. Install MariaDB and sysbench. Configure database to be on filesystem using device in Step 1 4. Run sysbench prepare Actual results: Fails to create data. --- Additional comment from Worker Ant on 2017-07-19 09:35:03 EDT --- REVIEW: https://review.gluster.org/17821 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#2) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Worker Ant on 2017-07-24 04:41:50 EDT --- REVIEW: https://review.gluster.org/17821 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#3) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Worker Ant on 2017-07-24 21:15:56 EDT --- COMMIT: https://review.gluster.org/17821 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 669868d23eaeba42809fca7be134137c607d64ed Author: Krutika Dhananjay <kdhananj> Date: Wed Jul 19 16:14:59 2017 +0530 storage/posix: Use the ret value of posix_gfid_heal() ... to make the change in commit acf8cfdf truly useful. Without this, a race between entry creation fops and lookup at posix layer can cause lookups to fail with ENODATA, as opposed to ENOENT. Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb BUG: 1472758 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: https://review.gluster.org/17821 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> --- Additional comment from Johan Bernhardsson on 2017-07-31 04:34:40 EDT --- This also seem to affect disperse volumes. Attaching logs from when we try to copy an image with qemu-img from one storage to another. We get this on vm storage configured like this: Volume Name: fs02 Type: Disperse Volume ID: 7f3d96e7-8d1e-48b8-bad0-dc5b3de13b38 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: vbgsan01:/gluster/fs02/fs02 Brick2: vbgsan02:/gluster/fs02/fs02 Brick3: vbgsan03:/gluster/fs02/fs02 from one brick log: [2017-07-30 15:29:43.815362] E [MSGID: 113002] [posix.c:266:posix_lookup] 0-fs02-posix: buf->ia_gfid is null for /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079 [No data available] [2017-07-30 15:29:43.815527] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-fs02-server: 400238: LOOKUP /.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079 (be318638-e8a0-4c6d-977d-7a937aa84806/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079) ==> (No data available) [No data available] [2017-07-30 15:29:46.837927] W [MSGID: 113096] [posix-handle.c:761:posix_handle_hard] 0-fs02-posix: link /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5180 -> /gluster/fs02/fs02/.glusterfs/1e/2e/1e2e59bb-f8e3-41d1-93c6-db795bbc96d4failed [File exists] [2017-07-30 15:29:46.837962] E [MSGID: 113020] [posix.c:1402:posix_mknod] 0-fs02-posix: setting gfid on /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5180 failed [2017-07-30 15:29:51.418415] E [MSGID: 113002] [posix.c:266:posix_lookup] 0-fs02-posix: buf->ia_gfid is null for /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315 [No data available] [2017-07-30 15:29:51.418471] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-fs02-server: 414456: LOOKUP /.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315 (be318638-e8a0-4c6d-977d-7a937aa84806/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315) ==> (No data available) [No data available] From mount log: [2017-07-30 13:09:35.041843] I [MSGID: 109066] [dht-rename.c:1569:dht_rename] 0-fs02-dht: renaming /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/797faedf-b7e2-4b3c-bff6-82264efa11f5.meta.new (hash=fs02-disperse-0/cache=fs02-disperse-0) => /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/797faedf-b7e2-4b3c-bff6-82264efa11f5.meta (hash=fs02-disperse-0/cache=fs02-disperse-0) [2017-07-30 13:09:35.841507] I [MSGID: 109066] [dht-rename.c:1569:dht_rename] 0-fs02-dht: renaming /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/4c3cc823-e977-4fa1-b233-718c445d632e.meta.new (hash=fs02-disperse-0/cache=fs02-disperse-0) => /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/4c3cc823-e977-4fa1-b233-718c445d632e.meta (hash=fs02-disperse-0/cache=<nul>) [2017-07-30 13:36:57.236163] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-0: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1300 (00000000-0000-0000-0000-000000000000) [No data available] [2017-07-30 13:36:57.237191] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-2: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1300 (00000000-0000-0000-0000-000000000000) [No data available] [2017-07-30 13:36:57.237220] W [MSGID: 122053] [ec-common.c:121:ec_check_status] 0-fs02-disperse-0: Operation failed on 1 of 3 subvolumes.(up=111, mask=111, remaining=000, good=101, bad=010) [2017-07-30 13:36:57.237229] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-fs02-disperse-0: Heal failed [Invalid argument] [2017-07-30 13:36:57.248807] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-0: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1301 (00000000-0000-0000-0000-000000000000) [No data available] [2017-07-30 13:36:57.248973] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-1: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1301 (00000000-0000-0000-0000-000000000000) [No data available] [2017-07-30 13:36:57.249502] W [MSGID: 122053] [ec-common.c:121:ec_check_status] 0-fs02-disperse-0: Operation failed on 1 of 3 subvolumes.(up=111, mask=111, remaining=000, good=011, bad=100) [2017-07-30 13:36:57.249524] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-fs02-disperse-0: Heal failed [Invalid argument] [2017-07-30 13:36:57.249535] E [MSGID: 133010] [shard.c:1725:shard_common_lookup_shards_cbk] 0-fs02-shard: Lookup on shard 1301 failed. Base file gfid = 3997817a-4678-4e75-8131-438db9faca9a [No data available] [2017-07-30 13:36:57.249585] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 84787: READ => -1 gfid=3997817a-4678-4e75-8131-438db9faca9a fd=0x7f0cfa192210 (No data available)
Could this affect writes as well? I see a similar problem when running dd if=/dev/zero of=/pathtoglusterfsmount/testfile bs=1M count=1000 oflag=direct It fails horribly on write and dd fails horribly. If we add write behind and set that to a high value it works. Also it only affects when shard is enabled.
REVIEW: https://review.gluster.org/18009 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#1) for review on release-3.11 by Krutika Dhananjay (kdhananj)
(In reply to Johan Bernhardsson from comment #1) > Could this affect writes as well? I see a similar problem when running dd > if=/dev/zero of=/pathtoglusterfsmount/testfile bs=1M count=1000 oflag=direct > > It fails horribly on write and dd fails horribly. If we add write behind and > set that to a high value it works. > > Also it only affects when shard is enabled. Yes, it can fail WRITEs too. Yeah it is more prominent with shard but the bug iteself is in posix translator and not shard.
COMMIT: https://review.gluster.org/18009 committed in release-3.11 by Shyamsundar Ranganathan (srangana) ------ commit 884a1be3aaebbe6dfaaab343452f937bfa92cb99 Author: Krutika Dhananjay <kdhananj> Date: Wed Jul 19 16:14:59 2017 +0530 storage/posix: Use the ret value of posix_gfid_heal() ... to make the change in commit acf8cfdf truly useful. Without this, a race between entry creation fops and lookup at posix layer can cause lookups to fail with ENODATA, as opposed to ENOENT. Backport of: > Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb > Reviewed-on: https://review.gluster.org/17821 > BUG: 1472758 > cherry-picked from 669868d23eaeba42809fca7be134137c607d64ed Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb BUG: 1479692 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: https://review.gluster.org/18009 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.3, please open a new bug report. glusterfs-3.11.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-August/000081.html [2] https://www.gluster.org/pipermail/gluster-users/