Bug 958781 - KVM guest I/O errors with xfs backed gluster volumes
Summary: KVM guest I/O errors with xfs backed gluster volumes
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: 3.4.1
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: GlusterFS Bugs list
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-02 11:58 UTC by JYundt
Modified: 2014-10-20 13:37 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-20 13:37:41 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Gluster Server Log (185.51 KB, text/x-log)
2013-05-02 11:58 UTC, JYundt
no flags Details
Gluster Client (KVM Host) Log (169.44 KB, text/x-log)
2013-05-02 11:59 UTC, JYundt
no flags Details
Strace of glusterd during read/write errors (826.43 KB, application/x-gzip)
2013-05-02 12:00 UTC, JYundt
no flags Details
Gluster Client (KVM Host) Log (3.4.0beta1) (179.17 KB, text/x-log)
2013-05-08 18:29 UTC, JYundt
no flags Details
Gluster Server Log (3.4.0beta1) (197.22 KB, text/x-log)
2013-05-08 18:30 UTC, JYundt
no flags Details
Strace of glusterd during read/write errors (3.4.0beta1) (1.03 MB, application/x-gzip)
2013-05-08 18:31 UTC, JYundt
no flags Details
Gluster Server Log (3.4.0beta3) (168.56 KB, text/x-log)
2013-06-12 03:48 UTC, JYundt
no flags Details
Gluster Client (KVM Host) Log (3.4.0beta3) (176.37 KB, text/x-log)
2013-06-12 03:48 UTC, JYundt
no flags Details
Strace of glusterd during read/write errors (3.4.0beta3) (1.50 MB, application/x-gzip)
2013-06-12 03:50 UTC, JYundt
no flags Details
gluster brick log with tracing enabled (3.4.0beta4) (1.15 MB, text/x-log)
2013-07-04 19:15 UTC, JYundt
no flags Details
gluster client (KVM host) log (3.4.0beta4) (121.18 KB, text/x-log)
2013-07-04 19:16 UTC, JYundt
no flags Details
Strace of glusterfsd during read/write errors (3.4.0beta4) (2.30 MB, text/plain)
2013-07-04 19:17 UTC, JYundt
no flags Details
gluster brick log with tracing enabled (3.4.0 GA) (1.11 MB, text/x-log)
2013-07-16 02:24 UTC, JYundt
no flags Details
gluster client (KVM host) log with tracing enabled (3.4.0 GA) (1.16 MB, text/x-log)
2013-07-16 02:25 UTC, JYundt
no flags Details
Strace of glusterfsd during read/write errors (3.4.0 GA) (2.22 MB, text/plain)
2013-07-16 02:26 UTC, JYundt
no flags Details
gluster brick log with tracing enabled (3.4.1 GA) (2.00 MB, text/x-log)
2013-10-15 23:38 UTC, JYundt
no flags Details
gluster client (KVM host) log with tracing enabled (3.4.1 GA) (167.56 KB, text/x-log)
2013-10-15 23:38 UTC, JYundt
no flags Details
Strace of glusterfsd during read/write errors (3.4.1 GA) (3.02 MB, application/gzip)
2013-10-15 23:40 UTC, JYundt
no flags Details

Description JYundt 2013-05-02 11:58:38 UTC
Created attachment 742661 [details]
Gluster Server Log

Description of problem:
Currently unable to use gluster storage pools (backed by xfs filesystems) with KVM.  I'm able to add a VirtIO disk without issue,
however, when I try to create (or mount) an ext4 filesystem from the
KVM guest, I get errors.  If I use a gluster volume that is _not_
backed by xfs (e.g. ext4)) everything works as expected.  I'm able to
consistently reproduce on RHEL 6.4 x86_64 (2.6.32-358.6.1.el6.x86_64).  Additionally, "cache mode" on the VirtIO disks is set to "none" (to
allow for safe live migration).


Version-Release number of selected component (if applicable):
3.4.0alpha3

How reproducible:
100%


Steps to Reproduce:
1. Create an xfs filesystem on gluster server (mkfs.xfs -i size=512 [LV name])
2. Mount filesystem on gluster server (mount [LV name] [mountpoint])
2. Create a gluster volume (gluster volume create xfs [xfs filesystem])
3. Start gluster volume (gluster volume start xfs)
4. Define gluster volume on KVM host/gluster client, I'm using virt-manager.
5. Add VirtIO disk from gluster storage pool (format=raw, cachemode=none) to KVM guest, again I'm using virt-manager
6. On KVM guest, partition new VirtIO disk (fdisk /dev/[new VirtIO disk])
7. On KVM guest, create an ext4 filesystem (mkfs.ext4 /dev/[new VirtIO disk]1)
8. On KVM guest, mount new ext4 filesystem (mount /dev/[new VirtIOdisk]1 [moutpoint])
9. Get errors on KVM guest:


[root@rhel6-guest ~]# mount /dev/vdb1 /mnt/ext4
mount: you must specify the filesystem type
[root@rhel6-guest ~]# 

Errors from KVM host/gluster client:

[2013-05-02 11:50:58.466222] W [client-rpc-fops.c:2697:client3_3_readv_cbk] 0-xfs-client-0: remote operation failed: Invalid argument
[2013-05-02 11:50:58.466258] W [fuse-bridge.c:2049:fuse_readv_cbk] 0-glusterfs-fuse: 9272: READ => -1 (Invalid argument)
[2013-05-02 11:50:58.485255] W [client-rpc-fops.c:873:client3_3_writev_cbk] 0-xfs-client-0: remote operation failed: Invalid argument
[2013-05-02 11:50:58.485301] W [fuse-bridge.c:2127:fuse_writev_cbk] 0-glusterfs-fuse: 9273: WRITE => -1 (Invalid argument)

Errors from gluster server:

[2013-05-02 11:50:58.462986] E [posix.c:1940:posix_readv] 0-xfs-posix: read failed on fd=0xc58228: Invalid argument
[2013-05-02 11:50:58.463024] I [server-rpc-fops.c:1489:server_readv_cbk] 0-xfs-server: 9013: READV 1 (50928256-564b-4d96-95b8-13ac2d638f10) ==> (Invalid argument)
[2013-05-02 11:50:58.482077] E [posix.c:2135:posix_writev] 0-xfs-posix: write failed: offset 32256, Invalid argument
[2013-05-02 11:50:58.482114] I [server-rpc-fops.c:1439:server_writev_cbk] 0-xfs-server: 9014: WRITEV 1 (50928256-564b-4d96-95b8-13ac2d638f10) ==> (Invalid argument)

  
Actual results:

Unable to mount/create ext4 filesystems on KVM guest with xfs backed gluster volumes.


Expected results:

Successfully create ext4 filesystems.

Additional info:

I'll upload straces, gluster server logs and gluster client logs shortly.

Related to gluster-users thread: http://www.gluster.org/pipermail/gluster-users/2013-April/035962.html

Comment 1 JYundt 2013-05-02 11:59:05 UTC
Created attachment 742662 [details]
Gluster Client (KVM Host) Log

Comment 2 JYundt 2013-05-02 12:00:11 UTC
Created attachment 742663 [details]
Strace of glusterd during read/write errors

Comment 3 JYundt 2013-05-08 18:27:23 UTC
I just updated my gluster server and gluster client (KVM host) to 3.4.0beta1.  Unfortunately, I'm getting the exact same errors.  I'll upload an strace, gluster server log and gluster client log shortly.

Comment 4 JYundt 2013-05-08 18:29:50 UTC
Created attachment 745334 [details]
Gluster Client (KVM Host) Log (3.4.0beta1)

Comment 5 JYundt 2013-05-08 18:30:16 UTC
Created attachment 745335 [details]
Gluster Server Log (3.4.0beta1)

Comment 6 JYundt 2013-05-08 18:31:06 UTC
Created attachment 745336 [details]
Strace of glusterd during read/write errors (3.4.0beta1)

Comment 7 JYundt 2013-06-12 03:46:49 UTC
Confirmed that I'm getting the same errors with 3.4.0beta3.  I'm attaching the gluster server brick log, gluster client log and strace of glusterd PID (on server).

Comment 8 JYundt 2013-06-12 03:48:08 UTC
Created attachment 759927 [details]
Gluster Server Log (3.4.0beta3)

Gluster Server (brick) log from 3.4.0beta3.

Comment 9 JYundt 2013-06-12 03:48:55 UTC
Created attachment 759928 [details]
Gluster Client (KVM Host) Log (3.4.0beta3)

Gluster client log from KVM host (3.4.0beta3).

Comment 10 JYundt 2013-06-12 03:50:20 UTC
Created attachment 759929 [details]
Strace of glusterd during read/write errors (3.4.0beta3)

Running strace -ttTf -o [file] -p [pid] against the glusterfsd process.

Comment 11 JYundt 2013-06-12 17:21:49 UTC
Just for some clarification: the 3.4.0beta3 strace output is from glusterfsd (not glusterd). Sorry for any confusion.

Comment 12 JYundt 2013-07-04 19:13:27 UTC
Just confirmed that I'm getting the same behavior with 3.4.0beta4.  I increased the log level to "TRACE" on the one xfs brick, started an strace and duplicated the same errors.  I'll upload my strace and gluster logs shortly.

Comment 13 JYundt 2013-07-04 19:15:51 UTC
Created attachment 768972 [details]
gluster brick log with tracing enabled (3.4.0beta4)

Comment 14 JYundt 2013-07-04 19:16:57 UTC
Created attachment 768973 [details]
gluster client (KVM host) log (3.4.0beta4)

Comment 15 JYundt 2013-07-04 19:17:45 UTC
Created attachment 768974 [details]
Strace of glusterfsd during read/write errors (3.4.0beta4)

Comment 16 JYundt 2013-07-16 02:23:30 UTC
I was able to reproduce the same problem on 3.4.0 (GA).  This time I increased the log level to "TRACE" on both the server brick and the client.  I'll upload logs (and an strace of glusterfsd) shortly.

Comment 17 JYundt 2013-07-16 02:24:46 UTC
Created attachment 773982 [details]
gluster brick log with tracing enabled (3.4.0 GA)

Comment 18 JYundt 2013-07-16 02:25:51 UTC
Created attachment 773983 [details]
gluster client (KVM host) log with tracing enabled (3.4.0 GA)

Comment 19 JYundt 2013-07-16 02:26:35 UTC
Created attachment 773984 [details]
Strace of glusterfsd during read/write errors (3.4.0 GA)

Comment 20 JYundt 2013-10-15 23:34:38 UTC
I'm able to reproduce this on 3.4.1.

I'l upload another round of straces and gluster (server+client) logs shortly.

Comment 21 JYundt 2013-10-15 23:38:11 UTC
Created attachment 812713 [details]
gluster brick log with tracing enabled (3.4.1 GA)

diagnostics.brick-log-level = DEBUG

Comment 22 JYundt 2013-10-15 23:38:55 UTC
Created attachment 812714 [details]
gluster client (KVM host) log with tracing enabled (3.4.1 GA)

diagnostics.client-log-level = DEBUG

Comment 23 JYundt 2013-10-15 23:40:06 UTC
Created attachment 812715 [details]
Strace of glusterfsd during read/write errors (3.4.1 GA)

strace -ttTf -p [pid of glusterfsd] -o /tmp/glusterfsd_server_3.4.1_strace.20131015.out

Comment 24 JYundt 2013-10-28 00:55:05 UTC
I noticed that all disks in my gluster server(s) have 4K sectors.
Using an older disk with 512 byte sectors, I did _not_ get any errors
on my gluster client / KVM server.  I switched back to using my newer
4K drives and manually set the XFS sector size (sectsz) to 512.  With
the manually set sector size of 512, everything worked as expected.

I think I might be hitting some sort of qemu/libvirt bug.  However,
all of the bugs I found that sound similar[1][2] have already been
fixed in RHEL6.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=608548
[2] https://bugzilla.redhat.com/show_bug.cgi?id=748902

Comment 25 Niels de Vos 2014-09-27 17:22:59 UTC
I'm unsure what would cause a EINVAL when writing to an image file on a XFS brick. The only thing that I know can cause this, is reading/writing after the EOF. It is possible that the sparse file has not been completely created (similar to bug 1058663).

In the logs of the fuse client, we can observe that the write-behind translator is enabled:

 13: volume xfs-write-behind
 14:     type performance/write-behind
 15:     subvolumes xfs-dht
 16: end-volume


Because the problem from bug 1058663 only happens irregular, I would be interested in any test results with glusterfs-3.5.1, or where write-behind is disabled. Could you maybe provide that?


From attachment #812715 [details], strace of the brick:

Open test.img returns fd=14:

55581 19:30:48.112025 open("/gluster/xfs/test.img", O_RDWR|O_DIRECT|O_CLOEXEC <unfinished ...>
55539 19:30:48.112044 <... readv resumed> [{"\0\0A4\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\f\0\5\363\227\0\0\0$"..., 120}], 1) = 120 <0.000026>
55581 19:30:48.112073 <... open resumed> ) = 14 <0.000034>

pread(), fstat() and fgetxattr() work, these are the last successful calls:

55545 19:30:59.579685 pread(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096 <0.000095>
55545 19:30:59.579837 fstat(14, {st_mode=S_IFREG|0600, st_size=1073741824, ...}) = 0 <0.000012>
55545 19:30:59.579902 fgetxattr(14, "trusted.gfid", "\x1ef\xcb\x10\x10\x1fM\x12\xbd\xb0\xb6*\xd3\xdc\xe6\xd9", 16) = 16 <0.000013>

pread() starts to return EINVAL (note the huge offset?):

55581 19:30:59.581812 pread(14, 0x7f2b155be000, 4096, 1073380864) = -1 EINVAL (Invalid argument) <0.000012>
...
55545 19:30:59.587159 pread(14, 0x7f2b155be000, 4096, 1073380864) = -1 EINVAL (Invalid argument) <0.000012>

pwrite() starts to fail too (much smaller offset):

55581 19:31:04.623451 pwrite(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 32256 <unfinished ...>
...
55581 19:31:04.623497 <... pwrite resumed> ) = -1 EINVAL (Invalid argument) <0.000022>

Comment 26 Niels de Vos 2014-09-27 18:21:14 UTC
From 'man 2 write':

       EINVAL fd is attached to an object which is unsuitable for writing;  or
              the  file  was  opened  with  the  O_DIRECT flag, and either the
              address specified in buf, the value specified in count,  or  the
              current file offset is not suitably aligned.

The offset could be after EOF, but O_DIRECT is used to open the file. Maybe there are some alignment issues...

Comment 27 JYundt 2014-09-27 22:06:39 UTC
(In reply to Niels de Vos from comment #25)
> Because the problem from bug 1058663 only happens irregular, I would be
> interested in any test results with glusterfs-3.5.1, or where write-behind
> is disabled. Could you maybe provide that?

Unfortunately I no longer have access to this hardware setup. 

Let me see if I can try to build out something similar.  (It might take me a little bit.)

Comment 28 JYundt 2014-09-27 22:08:19 UTC
(In reply to Niels de Vos from comment #26)

> The offset could be after EOF, but O_DIRECT is used to open the file. Maybe
> there are some alignment issues...

And FWIW, changing some of the mkfs.xfs attributes (mkfs.xfs -f -i size=512 -s size=512) fixed my problem.

Based on this information, do you still want me to try to reproduce with 3.5.1?

Comment 29 Niels de Vos 2014-10-06 13:15:19 UTC
(In reply to JYundt from comment #28)
> (In reply to Niels de Vos from comment #26)
> 
> > The offset could be after EOF, but O_DIRECT is used to open the file. Maybe
> > there are some alignment issues...
> 
> And FWIW, changing some of the mkfs.xfs attributes (mkfs.xfs -f -i size=512
> -s size=512) fixed my problem.
> 
> Based on this information, do you still want me to try to reproduce with
> 3.5.1?

No, that is not needed. This definitely sounds like an alignment issue.

It would be interesting to know if the errors in the kvm-guest also happen when the filesystem of the brick is used directly to store the images.

The bricks return the same values as the output of 'stat -f /bricks/dir/....' when a volume gets mounted (by fuse or through the api). If the blocksize differs between bricks, I do not doubt that the result is unstable.

We would need at least the following info to understand more about this:
1. does the issue happen without gluster, but on the filesystem of the brick
2. output of xfs_info from the filesystem
3. output of dumpe2fs from the ext4 filesystem on the same storage/disk

If you can not provide the details, or have no interest in digging further, let us know and we'll close the bug.

[ See comment #24 and comment #28 contain a fix/workaround. ]

Comment 30 JYundt 2014-10-20 13:20:04 UTC
(In reply to Niels de Vos from comment #29)
> (In reply to JYundt from comment #28)
> > (In reply to Niels de Vos from comment #26)
> > 
> If you can not provide the details, or have no interest in digging further,
> let us know and we'll close the bug.

I apologize but due to the lack of hardware, I don't think I'll be able to reproduce this in a timely fashion.

Do you want to close it in the meantime?

Comment 31 Niels de Vos 2014-10-20 13:37:41 UTC
Okay, yes, let's close this for now. Feel free to re-open this bug or file a new one when more details get available (also see comment #29).


Note You need to log in before you can comment on or make changes to this bug.