1472757 – Running sysbench on vm disk from plain distribute gluster volume causes disk corruption

Bug 1472757 - Running sysbench on vm disk from plain distribute gluster volume causes disk corruption

Summary: Running sysbench on vm disk from plain distribute gluster volume causes disk ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	posix
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Krutika Dhananjay
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:	rebase
Depends On:
Blocks:	1134318 1472758 1479692 1479717 1480193 1482376 1503134 1523608 1583464
TreeView+	depends on / blocked

Reported:	2017-07-19 11:13 UTC by Sahina Bose
Modified:	2018-09-04 06:36 UTC (History)
CC List:	12 users (show)
Fixed In Version:	glusterfs-3.12.2-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1472758 1583464 (view as bug list)
Environment:
Last Closed:	2018-09-04 06:34:19 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
mount log (568.72 KB, text/plain) 2017-07-19 11:24 UTC, Sahina Bose	no flags	Details
brick log (2.90 MB, text/plain) 2017-07-19 11:25 UTC, Sahina Bose	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:36:23 UTC

Description Sahina Bose 2017-07-19 11:13:21 UTC

Description of problem:

I created a VM using disks stored in gluster volume, and ran the sysbench test on it. This caused I/O errors and the disk to be mounted as read-only disk.

# sysbench prepare --test=oltp --mysql-table-engine=innodb --mysql-password=pwd --oltp-table-size=500000000 --oltp-dist-type=gaussian

- Errors in dmesg
[ 2838.983763] blk_update_request: I/O error, dev vdb, sector 0
[ 2842.932506] blk_update_request: I/O error, dev vdb, sector 524722736
[ 2842.932577] Aborting journal on device vdb-8.
[ 2842.933882] EXT4-fs error (device vdb): ext4_journal_check_start:56: Detected aborted journal
[ 2842.934009] EXT4-fs (vdb): Remounting filesystem read-only

On the host's gluster logs:
mount log:
[2017-07-19 07:42:47.219501] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-glusterlocal1-client-0: remote operation failed. Path: /.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 (00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-19 07:42:47.219587] E [MSGID: 133010] [shard.c:1725:shard_common_lookup_shards_cbk] 0-glusterlocal1-shard: Lookup on shard 843 failed. Base file gfid = c37d9820-0fd8-4d8e-af67-a9a54e5a99af [No data available]

brick log:
[2017-07-19 07:42:16.094979] E [MSGID: 113020] [posix.c:1361:posix_mknod] 0-glusterlocal1-posix: setting gfid on /rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.842 failed
[2017-07-19 07:42:47.218982] E [MSGID: 113002] [posix.c:253:posix_lookup] 0-glusterlocal1-posix: buf->ia_gfid is null for /rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 [No data available]

gluster vol info:
Volume Name: glusterlocal1
Type: Distribute
Volume ID: 3b0d4b90-10a4-4a91-80c1-27d051daf731
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.70.40.33:/rhgs/bricks/gv1
Options Reconfigured:
performance.strict-o-direct: on
storage.owner-gid: 107
storage.owner-uid: 107
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on



Version-Release number of selected component (if applicable):
glusterfs-3.8.4-18.4.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create an image on gluster volume mount point using qemu-img create qemu-img create -f qcow2  -o preallocation=off /mnt/glusterlocal1/vm1boot.img 500G
2. Start the VM with additional device as image created in Step 1
3. Install MariaDB and sysbench. Configure database to be on filesystem using device in Step 1
4. Run sysbench prepare

Actual results:
Fails to create data.

Additional info:

Comment 2 Sahina Bose 2017-07-19 11:24:09 UTC

Created attachment 1300982 [details]
mount log

Comment 3 Sahina Bose 2017-07-19 11:25:02 UTC

Created attachment 1300983 [details]
brick log

Comment 10 Atin Mukherjee 2017-09-19 05:06:33 UTC

upstream mainline patch : https://review.gluster.org/17821

Comment 15 SATHEESARAN 2018-07-06 08:01:25 UTC

Tested with glusterfs-3.8.4-54.13.el7rhgs 

1. Created a single node RHHI with RHV 4.2.5-1
2. HE VM was running with its storage on HE storage with plain distribute volume
3. There were 9 VMs running with vmstore SD, which is backed with 'vmstore' plain distribute volume
4. Ran the sysbench OLTP workload on all the VMs as mentioned in comment0
5. No problems seen with the VMs.

Comment 16 SATHEESARAN 2018-07-06 08:26:19 UTC

Apologies that I verified this bug assuming this bug is for RHGS 3.3.1.
Moving the state accordingly. My comment comment15 doesn't hold true for RHGS 3.4.0

Comment 17 SATHEESARAN 2018-08-23 19:23:05 UTC

Tested with the test steps mentioned in comment15. All worked good.

Comment 18 errata-xmlrpc 2018-09-04 06:34:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.