Bug 1480193

Summary:	Running sysbench on vm disk from plain distribute gluster volume causes disk corruption
Product:	[Community] GlusterFS	Reporter:	Krutika Dhananjay <kdhananj>
Component:	posix	Assignee:	bugs <bugs>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.8	CC:	bugs, johan, kdhananj, pkarampu, rhinduja, rhs-bugs, sabose, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.8.15	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1472758	Environment:
Last Closed:	2017-08-29 10:19:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1472757, 1472758, 1583464
Bug Blocks:	1458846, 1469558, 1479692, 1479717

Description Krutika Dhananjay 2017-08-10 11:27:03 UTC

+++ This bug was initially created as a clone of Bug #1472758 +++

+++ This bug was initially created as a clone of Bug #1472757 +++

Description of problem:

I created a VM using disks stored in gluster volume, and ran the sysbench test on it. This caused I/O errors and the disk to be mounted as read-only disk.

# sysbench prepare --test=oltp --mysql-table-engine=innodb --mysql-password=pwd --oltp-table-size=500000000 --oltp-dist-type=gaussian

- Errors in dmesg
[ 2838.983763] blk_update_request: I/O error, dev vdb, sector 0
[ 2842.932506] blk_update_request: I/O error, dev vdb, sector 524722736
[ 2842.932577] Aborting journal on device vdb-8.
[ 2842.933882] EXT4-fs error (device vdb): ext4_journal_check_start:56: Detected aborted journal
[ 2842.934009] EXT4-fs (vdb): Remounting filesystem read-only

On the host's gluster logs:
mount log:
[2017-07-19 07:42:47.219501] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-glusterlocal1-client-0: remote operation failed. Path: /.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-19 07:42:47.219587] E [MSGID: 133010] [shard.c:1725:shard_common_lookup_shards_cbk] 0-glusterlocal1-shard: Lookup on shard 843 failed. Base file gfid = c37d9820-0fd8-4d8e-af67-a9a54e5a99af [No data available]

brick log:
[2017-07-19 07:42:16.094979] E [MSGID: 113020] [posix.c:1361:posix_mknod] 0-glusterlocal1-posix: setting gfid on /rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.842 failed
[2017-07-19 07:42:47.218982] E [MSGID: 113002] [posix.c:253:posix_lookup] 0-glusterlocal1-posix: buf->ia_gfid is null for /rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 [No data available]

gluster vol info:
Volume Name: glusterlocal1
Type: Distribute
Volume ID: 3b0d4b90-10a4-4a91-80c1-27d051daf731
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/rhgs/bricks/gv1
Options Reconfigured:
performance.strict-o-direct: on
storage.owner-gid: 107
storage.owner-uid: 107
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Create an image on gluster volume mount point using qemu-img create qemu-img create -f qcow2  -o preallocation=off /mnt/glusterlocal1/vm1boot.img 500G
2. Start the VM with additional device as image created in Step 1
3. Install MariaDB and sysbench. Configure database to be on filesystem using device in Step 1
4. Run sysbench prepare

Actual results:
Fails to create data.

--- Additional comment from Worker Ant on 2017-07-19 09:35:03 EDT ---

REVIEW: https://review.gluster.org/17821 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Worker Ant on 2017-07-24 04:41:50 EDT ---

REVIEW: https://review.gluster.org/17821 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#3) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Worker Ant on 2017-07-24 21:15:56 EDT ---

COMMIT: https://review.gluster.org/17821 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 669868d23eaeba42809fca7be134137c607d64ed
Author: Krutika Dhananjay <kdhananj>
Date:   Wed Jul 19 16:14:59 2017 +0530

    storage/posix: Use the ret value of posix_gfid_heal()
    
    ... to make the change in commit acf8cfdf truly useful.
    
    Without this, a race between entry creation fops and lookup
    at posix layer can cause lookups to fail with ENODATA, as
    opposed to ENOENT.
    
    Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb
    BUG: 1472758
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/17821
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

--- Additional comment from Johan Bernhardsson on 2017-07-31 04:34:40 EDT ---

This also seem to affect disperse volumes.  Attaching logs from when we try to copy an image with qemu-img from one storage to another.

We get this on vm storage configured like this:
Volume Name: fs02
Type: Disperse
Volume ID: 7f3d96e7-8d1e-48b8-bad0-dc5b3de13b38
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: vbgsan01:/gluster/fs02/fs02
Brick2: vbgsan02:/gluster/fs02/fs02
Brick3: vbgsan03:/gluster/fs02/fs02


from one brick log:
[2017-07-30 15:29:43.815362] E [MSGID: 113002] [posix.c:266:posix_lookup] 0-fs02-posix: buf->ia_gfid is null for /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079 [No data available]
[2017-07-30 15:29:43.815527] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-fs02-server: 400238: LOOKUP /.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079 (be318638-e8a0-4c6d-977d-7a937aa84806/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079) ==> (No data available) 
[No data available]
[2017-07-30 15:29:46.837927] W [MSGID: 113096] [posix-handle.c:761:posix_handle_hard] 0-fs02-posix: link /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5180 -> /gluster/fs02/fs02/.glusterfs/1e/2e/1e2e59bb-f8e3-41d1-93c6-db795bbc96d4failed  [File exists]
[2017-07-30 15:29:46.837962] E [MSGID: 113020] [posix.c:1402:posix_mknod] 0-fs02-posix: setting gfid on /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5180 failed
[2017-07-30 15:29:51.418415] E [MSGID: 113002] [posix.c:266:posix_lookup] 0-fs02-posix: buf->ia_gfid is null for /gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315 [No data available]
[2017-07-30 15:29:51.418471] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-fs02-server: 414456: LOOKUP /.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315 (be318638-e8a0-4c6d-977d-7a937aa84806/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315) ==> (No data available) 
[No data available]


From mount log:
[2017-07-30 13:09:35.041843] I [MSGID: 109066] [dht-rename.c:1569:dht_rename] 0-fs02-dht: renaming /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/797faedf-b7e2-4b3c-bff6-82264efa11f5.meta.new (hash=fs02-disperse-0/cache=fs02-disperse-0) =>
 /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/797faedf-b7e2-4b3c-bff6-82264efa11f5.meta (hash=fs02-disperse-0/cache=fs02-disperse-0)
[2017-07-30 13:09:35.841507] I [MSGID: 109066] [dht-rename.c:1569:dht_rename] 0-fs02-dht: renaming /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/4c3cc823-e977-4fa1-b233-718c445d632e.meta.new (hash=fs02-disperse-0/cache=fs02-disperse-0) =>
 /0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/4c3cc823-e977-4fa1-b233-718c445d632e.meta (hash=fs02-disperse-0/cache=<nul>)
[2017-07-30 13:36:57.236163] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-0: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1300 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.237191] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-2: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1300 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.237220] W [MSGID: 122053] [ec-common.c:121:ec_check_status] 0-fs02-disperse-0: Operation failed on 1 of 3 subvolumes.(up=111, mask=111, remaining=000, good=101, bad=010)
[2017-07-30 13:36:57.237229] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-fs02-disperse-0: Heal failed [Invalid argument]
[2017-07-30 13:36:57.248807] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-0: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1301 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.248973] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-1: remote operation failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1301 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.249502] W [MSGID: 122053] [ec-common.c:121:ec_check_status] 0-fs02-disperse-0: Operation failed on 1 of 3 subvolumes.(up=111, mask=111, remaining=000, good=011, bad=100)
[2017-07-30 13:36:57.249524] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-fs02-disperse-0: Heal failed [Invalid argument]
[2017-07-30 13:36:57.249535] E [MSGID: 133010] [shard.c:1725:shard_common_lookup_shards_cbk] 0-fs02-shard: Lookup on shard 1301 failed. Base file gfid = 3997817a-4678-4e75-8131-438db9faca9a [No data available]
[2017-07-30 13:36:57.249585] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 84787: READ => -1 gfid=3997817a-4678-4e75-8131-438db9faca9a fd=0x7f0cfa192210 (No data available)

Comment 1 Worker Ant 2017-08-10 12:06:01 UTC

REVIEW: https://review.gluster.org/18015 (storage/posix: Use the ret value of posix_gfid_heal()) posted (#1) for review on release-3.8 by Krutika Dhananjay (kdhananj)

Comment 2 Worker Ant 2017-08-10 16:41:47 UTC

COMMIT: https://review.gluster.org/18015 committed in release-3.8 by Niels de Vos (ndevos) 
------
commit 982480ecbc0085d4e6bbbc976d2549abbd5f3e61
Author: Krutika Dhananjay <kdhananj>
Date:   Wed Jul 19 16:14:59 2017 +0530

    storage/posix: Use the ret value of posix_gfid_heal()
    
    ... to make the change in commit acf8cfdf truly useful.
    
    Without this, a race between entry creation fops and lookup
    at posix layer can cause lookups to fail with ENODATA, as
    opposed to ENOENT.
    
    Backport of:
    > Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb
    > Reviewed-on: https://review.gluster.org/17821
    > BUG: 1472758
    > cherry-picked from 669868d23eaeba42809fca7be134137c607d64ed
    
    Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb
    BUG: 1480193
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/18015
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra Bhat <raghavendra>
    Reviewed-by: Niels de Vos <ndevos>

Comment 3 Jiffin 2017-08-29 10:19:53 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.15, please open a new bug report.

glusterfs-3.8.15 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] blah
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 4 Niels de Vos 2017-10-03 14:20:07 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.15, please open a new bug report.

glusterfs-3.8.15 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-August/000080.html
[2] https://www.gluster.org/pipermail/gluster-users/