Bug 1379919 - VM errors out while booting from the image on gluster replica 3 volume with compound fops enabled
Summary: VM errors out while booting from the image on gluster replica 3 volume with c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Krutika Dhananjay
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1378778 1387984 1388318
Blocks: 1351528
TreeView+ depends on / blocked
 
Reported: 2016-09-28 06:25 UTC by SATHEESARAN
Modified: 2017-03-23 05:49 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
virt-gluster integration
Last Closed: 2017-03-23 05:49:28 UTC


Attachments (Terms of Use)
brick_log tar-ed (272.27 KB, text/plain)
2016-09-28 06:41 UTC, SATHEESARAN
no flags Details
fuse mount logs tar-ed (1.81 MB, application/x-gzip)
2016-09-28 06:42 UTC, SATHEESARAN
no flags Details
qemu_logs of the VM (2.40 MB, application/x-gzip)
2016-09-28 06:42 UTC, SATHEESARAN
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description SATHEESARAN 2016-09-28 06:25:27 UTC
Description of problem:
-----------------------
Configured sharded ( shard-block-size=512MB ) replica 3 gluster volume to store VM images. VM errors out while booting from the image, sometimes even the installation of OS errors out

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHEL 7.2
RHGS 3.2.0 interim build ( glusterfs-3.8.4-1.el7rhgs )
qemu-kvm-1.5.3-105.el7_2.7.x86_64
qemu-img-1.5.3-105.el7_2.7.x86_64
libvirt-1.2.17-13.el7_2.5.x86_64

How reproducible:
-----------------
Always ( 7/7 )

Steps to Reproduce:
-------------------
1. Create a gluster replica 3 volume.
2. Enable sharding with shard block size set to 512MB
3. Enable compound-fops on the volume
4. Create a raw image file on the volume using qemu-img + libgfapi
5. Install & boot a VM using libgfapi

Actual results:
---------------
VM errors out while booting from installed os. Sometimes installation itself errors out

Expected results:
-----------------
VMs should boot successfully

Additional Info
----------------
1. I have turned off compound fops and I am not seeing this problem
2. This issue is also seen with fuse mount too

Comment 3 SATHEESARAN 2016-09-28 06:34:01 UTC
Errors as seen in one of the brick logs:

<snip>
[2016-09-28 09:28:29.748305] E [inodelk.c:304:__inode_unlock_lock] 0-repvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by b47312255f7f0000 on 0x7fa408104500
[2016-09-28 09:28:29.748390] E [MSGID: 115053] [server-rpc-fops.c:313:server_finodelk_cbk] 0-repvol-server: 76262: FINODELK -2 (d8a6d740-b41b-417d-bb5c-4fa90cb43260) ==> (Invalid argument) [Invalid argument]
[2016-09-28 09:28:36.034630] E [inodelk.c:304:__inode_unlock_lock] 0-repvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by 107212255f7f0000 on 0x7fa408104500
[2016-09-28 09:28:36.034702] E [MSGID: 136002] [decompounder.c:370:dc_finodelk_cbk] 0-repvol-decompounder: fop number 2 failed. Unwinding. [Invalid argument]
[2016-09-28 09:28:36.034740] E [MSGID: 115090] [server-rpc-fops.c:2087:server_compound_cbk] 0-repvol-server: 78096: COMPOUND-2 (d8a6d740-b41b-417d-bb5c-4fa90cb43260) ==> (Invalid argument) [Invalid argument]
[2016-09-28 09:29:14.755726] E [inodelk.c:304:__inode_unlock_lock] 0-repvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by 287312255f7f0000 on 0x7fa408104500
[2016-09-28 09:29:14.755933] E [MSGID: 115090] [server-rpc-fops.c:2087:server_compound_cbk] 0-repvol-server: 80092: COMPOUND-2 (d8a6d740-b41b-417d-bb5c-4fa90cb43260) ==> (Invalid argument) [Invalid argument]
[2016-09-28 09:29:44.759190] E [inodelk.c:304:__inode_unlock_lock] 0-repvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by 287312255f7f0000 on 0x7fa408104500
[2016-09-28 09:29:44.759254] E [MSGID: 115090] [server-rpc-fops.c:2087:server_compound_cbk] 0-repvol-server: 80327: COMPOUND-2 (d8a6d740-b41b-417d-bb5c-4fa90cb43260) ==> (Invalid argument) [Invalid argument]
The message "E [MSGID: 136002] [decompounder.c:370:dc_finodelk_cbk] 0-repvol-decompounder: fop number 2 failed. Unwinding. [Invalid argument]" repeated 2 times between [2016-09-28 09:28:36.034702] and [2016-09-28 09:29:44.759245]
[2016-09-28 09:30:39.254643] E [inodelk.c:304:__inode_unlock_lock] 0-repvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by 287312255f7f0000 on 0x7fa408104500
[2016-09-28 09:30:39.254714] E [MSGID: 136002] [decompounder.c:370:dc_finodelk_cbk] 0-repvol-decompounder: fop number 2 failed. Unwinding. [Invalid argument]
[2016-09-28 09:30:39.254824] E [MSGID: 115090] [server-rpc-fops.c:2087:server_compound_cbk] 0-repvol-server: 80392: COMPOUND-2 (d8a6d740-b41b-417d-bb5c-4fa90cb43260) ==> (Invalid argument) [Invalid argument]
</snip>

Comment 4 SATHEESARAN 2016-09-28 06:41:42 UTC
Created attachment 1205415 [details]
brick_log tar-ed

Comment 5 SATHEESARAN 2016-09-28 06:42:13 UTC
Created attachment 1205416 [details]
fuse mount logs tar-ed

Comment 6 SATHEESARAN 2016-09-28 06:42:55 UTC
Created attachment 1205417 [details]
qemu_logs of the VM

Comment 7 SATHEESARAN 2016-09-28 06:44:43 UTC
I have missed some more information which I think would be worth for debugging

1. VM name is vm2
2. The image file that went corrupted is vm2.img
3. The size of the image file 30G
4. The volume (repvol) has 2 images ( vm2.img and cvm2.img )

Comment 8 Krutika Dhananjay 2016-09-28 07:03:44 UTC
There's a file corruption issue with compound fops that I'm currently debugging - https://bugzilla.redhat.com/show_bug.cgi?id=1378778#c1

This *could* be related to the same issue. Will post what I find once I have the RC.

Comment 13 Krutika Dhananjay 2016-10-24 04:51:40 UTC
Moving this bug to POST state as the upstream patch which was sent against BZ 1378778 is available for review at http://review.gluster.org/#/c/15654/

Comment 17 Atin Mukherjee 2016-10-24 14:13:55 UTC
upstream mainline : http://review.gluster.org/#/c/15654/
upstream 3.9 : http://review.gluster.org/#/c/15709/
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/87999/

Comment 19 SATHEESARAN 2016-10-25 10:31:05 UTC
Tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-3.el7rhgs )

0. Created a replica 3 volume and optimized for vm store
1. Created VM image file on the fuse mount
2. Installed RHEL 7.2 on the application VM

VM could successfully boot and there were no corruption of VM images

Comment 21 errata-xmlrpc 2017-03-23 05:49:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.