Bug 1479692

Summary: Running sysbench on vm disk from plain distribute gluster volume causes disk corruption
Product: [Community] GlusterFS Reporter: Krutika Dhananjay <kdhananj>
Component: posixAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11CC: bugs, johan, kdhananj, pkarampu, rhinduja, rhs-bugs, sabose, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.11.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1472758 Environment:
Last Closed: 2017-08-24 14:46:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1472757, 1472758, 1479717, 1480193, 1583464    
Bug Blocks: 1458846    

Description Krutika Dhananjay 2017-08-09 08:12:07 UTC
+++ This bug was initially created as a clone of Bug #1472758 +++

+++ This bug was initially created as a clone of Bug #1472757 +++

Description of problem:

I created a VM using disks stored in gluster volume, and ran the sysbench test on it. This caused I/O errors and the disk to be mounted as read-only disk.

# sysbench prepare --test=oltp --mysql-table-engine=innodb --mysql-password=pwd --oltp-table-size=500000000 --oltp-dist-type=gaussian

- Errors in dmesg
[ 2838.983763] blk_update_request: I/O error, dev vdb, sector 0
[ 2842.932506] blk_update_request: I/O error, dev vdb, sector 524722736
[ 2842.932577] Aborting journal on device vdb-8.
[ 2842.933882] EXT4-fs error (device vdb): ext4_journal_check_start:56: Detected aborted journal
[ 2842.934009] EXT4-fs (vdb): Remounting filesystem read-only

On the host's gluster logs:
mount log:
[2017-07-19 07:42:47.219501] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-glusterlocal1-client-0: remote operation failed. Path: /.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-19 07:42:47.219587] E [MSGID: 133010] [shard.c:1725:shard_common_lookup_shards_cbk] 0-glusterlocal1-shard: Lookup on shard 843 failed. Base file gfid = c37d9820-0fd8-4d8e-af67-a9a54e5a99af [No data available]

brick log:
[2017-07-19 07:42:16.094979] E [MSGID: 113020] [posix.c:1361:posix_mknod] 0-glusterlocal1-posix: setting gfid on /rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.842 failed
[2017-07-19 07:42:47.218982] E [MSGID: 113002] [posix.c:253:posix_lookup] 0-glusterlocal1-posix: buf->ia_gfid is null for /rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 [No data available]

gluster vol info:
Volume Name: glusterlocal1
Type: Distribute
Volume ID: 3b0d4b90-10a4-4a91-80c1-27d051daf731
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/rhgs/bricks/gv1
Options Reconfigured:
performance.strict-o-direct: on
storage.owner-gid: 107
storage.owner-uid: 107
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Create an image on gluster volume mount point using qemu-img create qemu-img create -f qcow2  -o preallocation=off /mnt/glusterlocal1/vm1boot.img 500G
2. Start the VM with additional device as image created in Step 1
3. Install MariaDB and sysbench. Configure database to be on filesystem using device in Step 1
4. Run sysbench prepare

Actual results:
Fails to create data.

--- Additional comment from Worker Ant on 2017-07-19 09:35:03 EDT ---

REVIEW: https://review.gluster.org/17821 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Worker Ant on 2017-07-24 04:41:50 EDT ---

REVIEW: https://review.gluster.org/17821 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#3) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Worker Ant on 2017-07-24 21:15:56 EDT ---

COMMIT: https://review.gluster.org/17821 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 669868d23eaeba42809fca7be134137c607d64ed
Author: Krutika Dhananjay <kdhananj>
Date:   Wed Jul 19 16:14:59 2017 +0530

    storage/posix: Use the ret value of posix_gfid_heal()
    
    ... to make the change in commit acf8cfdf truly useful.
    
    Without this, a race between entry creation fops and lookup
    at posix layer can cause lookups to fail with ENODATA, as
    opposed to ENOENT.
    
    Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb
    BUG: 1472758
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/17821
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

--- Additional comment from Johan Bernhardsson on 2017-07-31 04:34:40 EDT ---

This also seem to affect disperse volumes.  Attaching logs from when we try to copy an image with qemu-img from one storage to another.

We get this on vm storage configured like this:
Volume Name: fs02
Type: Disperse
Volume ID: 7f3d96e7-8d1e-48b8-bad0-dc5b3de13b38
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: vbgsan01:/gluster/fs02/fs02
Brick2: vbgsan02:/gluster/fs02/fs02
Brick3: vbgsan03:/gluster/fs02/fs02


from one brick log:
[2017-07-30 15:29:43.815362] E [MSGID: 113002] [posix.c:266:posix_lookup] 0-fs02-posix: buf->ia_gfid is null for /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079 [No data available]
[2017-07-30 15:29:43.815527] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-fs02-server: 400238: LOOKUP /.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079 (be318638-e8a0-4c6d-977d-7a937aa84806/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079) ==> (No data available) 
[No data available]
[2017-07-30 15:29:46.837927] W [MSGID: 113096] [posix-handle.c:761:posix_handle_hard] 0-fs02-posix: link /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5180 -> /gluster/fs02/fs02/.glusterfs/1e/2e/1e2e59bb-f8e3-41d1-93c6-db795bbc96d4failed  [File exists]
[2017-07-30 15:29:46.837962] E [MSGID: 113020] [posix.c:1402:posix_mknod] 0-fs02-posix: setting gfid on /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5180 failed
[2017-07-30 15:29:51.418415] E [MSGID: 113002] [posix.c:266:posix_lookup] 0-fs02-posix: buf->ia_gfid is null for /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315 [No data available]
[2017-07-30 15:29:51.418471] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-fs02-server: 414456: LOOKUP /.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315 (be318638-e8a0-4c6d-977d-7a937aa84806/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315) ==> (No data available) 
[No data available]


From mount log:
[2017-07-30 13:09:35.041843] I [MSGID: 109066] [dht-rename.c:1569:dht_rename] 0-fs02-dht: renaming /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/797faedf-b7e2-4b3c-bff6-82264efa11f5.meta.new (hash=fs02-disperse-0/cache=fs02-disperse-0) =>
 /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/797faedf-b7e2-4b3c-bff6-82264efa11f5.meta (hash=fs02-disperse-0/cache=fs02-disperse-0)
[2017-07-30 13:09:35.841507] I [MSGID: 109066] [dht-rename.c:1569:dht_rename] 0-fs02-dht: renaming /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/4c3cc823-e977-4fa1-b233-718c445d632e.meta.new (hash=fs02-disperse-0/cache=fs02-disperse-0) =>
 /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/4c3cc823-e977-4fa1-b233-718c445d632e.meta (hash=fs02-disperse-0/cache=<nul>)
[2017-07-30 13:36:57.236163] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-0: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1300 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.237191] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-2: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1300 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.237220] W [MSGID: 122053] [ec-common.c:121:ec_check_status] 0-fs02-disperse-0: Operation failed on 1 of 3 subvolumes.(up=111, mask=111, remaining=000, good=101, bad=010)
[2017-07-30 13:36:57.237229] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-fs02-disperse-0: Heal failed [Invalid argument]
[2017-07-30 13:36:57.248807] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-0: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1301 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.248973] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-1: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1301 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.249502] W [MSGID: 122053] [ec-common.c:121:ec_check_status] 0-fs02-disperse-0: Operation failed on 1 of 3 subvolumes.(up=111, mask=111, remaining=000, good=011, bad=100)
[2017-07-30 13:36:57.249524] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-fs02-disperse-0: Heal failed [Invalid argument]
[2017-07-30 13:36:57.249535] E [MSGID: 133010] [shard.c:1725:shard_common_lookup_shards_cbk] 0-fs02-shard: Lookup on shard 1301 failed. Base file gfid = 3997817a-4678-4e75-8131-438db9faca9a [No data available]
[2017-07-30 13:36:57.249585] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 84787: READ => -1 gfid=3997817a-4678-4e75-8131-438db9faca9a fd=0x7f0cfa192210 (No data available)

Comment 1 Johan Bernhardsson 2017-08-09 08:50:20 UTC
Could this affect writes as well? I see a similar problem when running dd if=/dev/zero of=/pathtoglusterfsmount/testfile bs=1M count=1000 oflag=direct

It fails horribly on write and dd fails horribly. If we add write behind and set that to a high value it works.

Also it only affects when shard is enabled.

Comment 2 Worker Ant 2017-08-09 09:11:13 UTC
REVIEW: https://review.gluster.org/18009 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#1) for review on release-3.11 by Krutika Dhananjay (kdhananj)

Comment 3 Krutika Dhananjay 2017-08-09 09:13:39 UTC
(In reply to Johan Bernhardsson from comment #1)
> Could this affect writes as well? I see a similar problem when running dd
> if=/dev/zero of=/pathtoglusterfsmount/testfile bs=1M count=1000 oflag=direct
> 
> It fails horribly on write and dd fails horribly. If we add write behind and
> set that to a high value it works.
> 
> Also it only affects when shard is enabled.

Yes, it can fail WRITEs too. Yeah it is more prominent with shard but the bug iteself is in posix translator and not shard.

Comment 4 Worker Ant 2017-08-12 13:43:10 UTC
COMMIT: https://review.gluster.org/18009 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 884a1be3aaebbe6dfaaab343452f937bfa92cb99
Author: Krutika Dhananjay <kdhananj>
Date:   Wed Jul 19 16:14:59 2017 +0530

    storage/posix: Use the ret value of posix_gfid_heal()
    
    ... to make the change in commit acf8cfdf truly useful.
    
    Without this, a race between entry creation fops and lookup
    at posix layer can cause lookups to fail with ENODATA, as
    opposed to ENOENT.
    
    Backport of:
    > Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb
    > Reviewed-on: https://review.gluster.org/17821
    > BUG: 1472758
    > cherry-picked from 669868d23eaeba42809fca7be134137c607d64ed
    
    Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb
    BUG: 1479692
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/18009
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra Bhat <raghavendra>

Comment 5 Shyamsundar 2017-08-24 14:46:21 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.3, please open a new bug report.

glusterfs-3.11.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-August/000081.html
[2] https://www.gluster.org/pipermail/gluster-users/