Bug 866459 - "Invalid Arguments" error messages seen in glustershd.log file when volume heal command is executed.
Summary: "Invalid Arguments" error messages seen in glustershd.log file when volume he...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: spandura
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-10-15 12:20 UTC by spandura
Modified: 2015-08-10 19:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-10 07:45:36 UTC
Embargoed:


Attachments (Terms of Use)

Description spandura 2012-10-15 12:20:06 UTC
Description of problem:
==========================

Invalid arguments error messages in self-heal daemon log file when "gluster volume heal <volume_name> " command is executed. 

[2012-10-15 17:31:12.417749] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-1: remote operation failed: Invalid argument
[2012-10-15 17:31:12.417794] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:a62b08a4-c5f4-4245-a1c2-6645ecdb3093> failed on subvolume replicate-client-1 (Invalid argument)


Version-Release number of selected component (if applicable):
==============================================================
[10/15/12 - 17:41:05 root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch
[10/15/12 - 17:41:10 root@rhs-client6 ~]# gluster --version
glusterfs 3.3.0rhsvirt1 built on Oct  8 2012 15:23:00

Steps to Reproduce:
====================
1.Create a pure replicate volume (1x2) with 2 servers and 1 brick on each server. This is the storage for the VM's. start the volume.

2.Set-up the KVM to use the volume as VM store. 

3.Create 2 virtual machines (vm1 and vm2) 

4.power off server1  (one of the server from each replicate pair)

5.delete the virtual machines vm2.

6.perform the following operations when server1 is down: 
   a. rhn_register vm1, 
   b. yum update on vm1
   c. create new virtual machine vm3
   d. start the virtual machine vm3
   f. rhn_register vm3
   g. create new virtual machine vm4

7. power on server1. 

8. Perform IO on the VM's while self-heal is in progress. 

9. After some time poweroff server1. and poweron it immediately. 

10. execute "gluster volume heal <volume_name>"   

Actual results:
-----------------

[2012-10-15 17:26:06.705237] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-0: remote operation failed: Invalid argument
[2012-10-15 17:26:06.705261] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:1f087927-a5e4-42d9-88c6-bdaf3d06da22> failed on subvolume replicate-client-0 (Invalid argument)

[10/15/12 - 17:26:32 root@rhs-client7 ~]# stat /disk0/.glusterfs/1f/08/1f087927-a5e4-42d9-88c6-bdaf3d06da22 
  File: `/disk0/.glusterfs/1f/08/1f087927-a5e4-42d9-88c6-bdaf3d06da22'
  Size: 2097152   	Blocks: 2056       IO Block: 4096   regular file
Device: fd04h/64772d	Inode: 1074765955  Links: 2
Access: (0660/-rw-rw----)  Uid: (   36/    vdsm)   Gid: (   36/     kvm)
Access: 2012-10-15 17:26:06.627574826 +0530
Modify: 2012-10-15 17:22:27.228389724 +0530
Change: 2012-10-15 17:26:06.626370057 +0530

[10/15/12 - 17:26:49 root@rhs-client7 ~]# find /disk0 -inum 1074765955 -print
/disk0/.glusterfs/1f/08/1f087927-a5e4-42d9-88c6-bdaf3d06da22
/disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/dom_md/leases

[10/15/12 - 17:27:00 root@rhs-client7 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/dom_md/leases
getfattr: Removing leading '/' from absolute path names
# file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/dom_md/leases
trusted.afr.replicate-client-0=0x000000060000000000000000
trusted.afr.replicate-client-1=0x000000000000000000000000
trusted.gfid=0x1f087927a5e442d988c6bdaf3d06da22

###############################################################################
[2012-10-15 17:26:07.099239] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-1: remote operation failed: Invalid argument
[2012-10-15 17:26:07.099283] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:a62b08a4-c5f4-4245-a1c2-6645ecdb3093> failed on subvolume replicate-client-1 (Invalid argument)


[10/15/12 - 17:27:43 root@rhs-client7 ~]# stat -c %i /disk0/.glusterfs/a6/2b/a62b08a4-c5f4-4245-a1c2-6645ecdb3093 
1610612915

[10/15/12 - 17:27:59 root@rhs-client7 ~]# find /disk0/ -inum 1610612915 -print
/disk0/.glusterfs/a6/2b/a62b08a4-c5f4-4245-a1c2-6645ecdb3093
/disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/1dc42d43-6e19-40f9-979c-db2fc7820d8c/6bbcbb32-cc52-4e33-9a1a-343026cfe7b7

[10/15/12 - 17:28:12 root@rhs-client7 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/1dc42d43-6e19-40f9-979c-db2fc7820d8c/6bbcbb32-cc52-4e33-9a1a-343026cfe7b7
getfattr: Removing leading '/' from absolute path names
# file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/1dc42d43-6e19-40f9-979c-db2fc7820d8c/6bbcbb32-cc52-4e33-9a1a-343026cfe7b7
trusted.afr.replicate-client-0=0x0000649f0000000000000000
trusted.afr.replicate-client-1=0x000000000000000000000000
trusted.gfid=0xa62b08a4c5f44245a1c26645ecdb3093

###############################################################################
[2012-10-15 17:26:07.173198] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-0: remote operation failed: Invalid argument
[2012-10-15 17:26:07.173232] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:c8587f1f-7ff7-42b8-a001-d269ccf10611> failed on subvolume replicate-client-0 (Invalid argument)

[10/15/12 - 17:28:20 root@rhs-client7 ~]# stat -c %i /disk0/.glusterfs/c8/58/c8587f1f-7ff7-42b8-a001-d269ccf10611 
536871134

[10/15/12 - 17:28:43 root@rhs-client7 ~]# find /disk0/ -inum 536871134 -print
/disk0/.glusterfs/c8/58/c8587f1f-7ff7-42b8-a001-d269ccf10611
/disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/80e6058e-48be-4fad-a886-1c7f97ff6ba0/fc29bef5-8024-43b8-a8b5-abc8b13a8e49

[10/15/12 - 17:28:53 root@rhs-client7 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/80e6058e-48be-4fad-a886-1c7f97ff6ba0/fc29bef5-8024-43b8-a8b5-abc8b13a8e49
getfattr: Removing leading '/' from absolute path names
# file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/80e6058e-48be-4fad-a886-1c7f97ff6ba0/fc29bef5-8024-43b8-a8b5-abc8b13a8e49
trusted.afr.replicate-client-0=0x00044fc70000000000000000
trusted.afr.replicate-client-1=0x000000000000000000000000
trusted.gfid=0xc8587f1f7ff742b8a001d269ccf10611


Additional info:
====================

[10/15/12 - 17:49:23 root@rhs-client6 ~]# gluster volume info replicate
 
Volume Name: replicate
Type: Replicate
Volume ID: d24fc5c1-74a7-46de-99ff-f7d827408993
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: rhs-client6.lab.eng.blr.redhat.com:/disk0
Brick2: rhs-client7.lab.eng.blr.redhat.com:/disk0
Options Reconfigured:
storage.linux-aio: enable
cluster.eager-lock: on
performance.read-ahead: disable
performance.stat-prefetch: disable
performance.io-cache: disable
performance.quick-read: disable

Comment 1 spandura 2012-10-15 12:25:56 UTC
[2012-10-15 17:51:12.813975] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:c8587f1f-7ff7-42b8-a001-d269ccf10611> failed on subvolume replicate-client-0 (Invalid argument)
[2012-10-15 17:51:12.918598] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-1: remote operation failed: Invalid argument
[2012-10-15 17:51:12.918639] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:91c4970e-f1d5-42bf-83d4-cae8766b061e> failed on subvolume replicate-client-1 (Invalid argument)

[10/15/12 - 17:54:46 root@rhs-client7 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/f6f2f7d5-a272-47a4-b343-9bfbb65a5b05/9a4e65b2-66a9-43a2-bcb7-4ea7100604d4
getfattr: Removing leading '/' from absolute path names
# file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/f6f2f7d5-a272-47a4-b343-9bfbb65a5b05/9a4e65b2-66a9-43a2-bcb7-4ea7100604d4
trusted.afr.replicate-client-0=0x0004093e0000000000000000
trusted.afr.replicate-client-1=0x000000000000000000000000
trusted.gfid=0x91c4970ef1d542bf83d4cae8766b061e



[10/15/12 - 17:55:50 root@rhs-client6 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/f6f2f7d5-a272-47a4-b343-9bfbb65a5b05/9a4e65b2-66a9-43a2-bcb7-4ea7100604d4
getfattr: Removing leading '/' from absolute path names
# file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/f6f2f7d5-a272-47a4-b343-9bfbb65a5b05/9a4e65b2-66a9-43a2-bcb7-4ea7100604d4
trusted.afr.replicate-client-0=0x000000010000000000000000
trusted.afr.replicate-client-1=0x000000010000000000000000
trusted.gfid=0x91c4970ef1d542bf83d4cae8766b061e

Comment 3 Amar Tumballi 2012-10-15 13:17:11 UTC
does it depend on 827334 ?

Comment 4 Pranith Kumar K 2012-10-16 03:35:30 UTC
Amar,
   It should not. The issue does not happen always, most probably remote_fd must have gone bad and server must be the xl that is sending this EINVAL. I am waiting for the logs to confirm that.

Pranith

Comment 6 Pranith Kumar K 2012-10-17 07:27:29 UTC
The invalid argument error is stemming from pread of posix_rchecksum.
[2012-10-15 17:49:10.048155] W [posix.c:3901:posix_rchecksum] 0-replicate-posix: pread of 131072 bytes returned -1 (Invalid argument)
[2012-10-15 17:49:10.048236] I [server3_1-fops.c:1498:server_rchecksum_cbk] 0-replicate-server: 338279: RCHECKSUM 3 (c8587f1f-7ff7-42b8-a001-d269ccf10611)==> -1 (Invalid argument)

One of the reasons pread can give EINVAL is if the offset is -ve. But it is not the case here. So I need to figure out what else could be the reason.

Comment 7 spandura 2012-10-18 07:35:32 UTC
Breakpoint 3, posix_rchecksum (frame=0x7fd08a7c5cb8, this=0x10e3120, fd=0x1103a10, offset=<value optimized out>, len=131072, xdata=<value optimized out>)
    at posix.c:3899
3899	                gf_log (this->name, GF_LOG_WARNING,
Missing separate debuginfos, use: debuginfo-install libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  posix_rchecksum (frame=0x7fd08a7c5cb8, this=0x10e3120, fd=0x1103a10, offset=<value optimized out>, len=131072, xdata=<value optimized out>) at posix.c:3899
#1  0x00000037a5a1bb9c in default_rchecksum (frame=<value optimized out>, this=0x10e4940, fd=0x1103a10, offset=131072, len=131072, xdata=0x0) at defaults.c:1182
#2  0x00000037a5a1bb9c in default_rchecksum (frame=<value optimized out>, this=0x10e5a40, fd=0x1103a10, offset=131072, len=131072, xdata=0x0) at defaults.c:1182
#3  0x00007fd086bbe15c in iot_rchecksum_wrapper (frame=<value optimized out>, this=0x10e6bc0, fd=0x1103a10, offset=131072, len=131072, xdata=0x0) at io-threads.c:2315
#4  0x00000037a5a2ed3c in call_resume_wind (stub=0x7fd08a48068c) at call-stub.c:2698
#5  call_resume (stub=0x7fd08a48068c) at call-stub.c:4151
#6  0x00007fd086bc6f1b in iot_worker (data=0x10f8de0) at io-threads.c:131
#7  0x000000322da077f1 in start_thread () from /lib64/libpthread.so.0
#8  0x000000322d2e5ccd in clone () from /lib64/libc.so.6
(gdb) info registers 
rax            0xffffffffffffffff	-1
rbx            0x7fd08a7c5cb8	140533653331128
rcx            0xffffffffffffffa8	-88
rdx            0x16	22
rsi            0x7fd068002910	140533074766096
rdi            0x0	0
rbp            0x10e3120	0x10e3120
rsp            0x7fd08413ac40	0x7fd08413ac40
r8             0xc	12
r9             0x9	9
r10            0x20000	131072
r11            0x0	0
r12            0x7fd068002910	140533074766096
r13            0x1103a10	17840656
r14            0xffffffff	4294967295
r15            0x20000	131072
rip            0x7fd08740aee8	0x7fd08740aee8 <posix_rchecksum+632>
eflags         0x286	[ PF SF IF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0

Comment 8 Pranith Kumar K 2012-10-25 14:13:21 UTC
Thanks to the tip from Jeff Darcy. Problem seems to be because of O_DIRECT and posix-aio. Here is the simpler test case that re-creates the problem.
Steps:
1) same volume configuration as above.
2) dd if=/dev/urandom of=a bs=1M count=1
3) kill one of the bricks in the replica pair
4) dd if=/dev/urandom of=a bs=1M count=1
5) bring back the brick. Self-heal will keep failing saying EINVAL in rchecksum.

I made the following change in posix_rchecksum before pread and it works fine.
pranithk @ pranithk-laptop ~/workspace/rhs-glusterfs
19:34:18 :) ⚡ git diff
diff --git a/xlators/storage/posix/src/posix.c b/xlators/storage/posix/src/posix.c
index 5260ff3..cf2b5fb 100644
--- a/xlators/storage/posix/src/posix.c
+++ b/xlators/storage/posix/src/posix.c
@@ -3871,6 +3871,7 @@ posix_rchecksum (call_frame_t *frame, xlator_t *this,
         int              ret           = 0;
         int32_t          weak_checksum = 0;
         unsigned char    strong_checksum[MD5_DIGEST_LENGTH];
+        int flags = 0;
 
         VALIDATE_OR_GOTO (frame, out);
         VALIDATE_OR_GOTO (this, out);
@@ -3892,6 +3893,8 @@ posix_rchecksum (call_frame_t *frame, xlator_t *this,
                 goto out;
         }
 
+        flags = fcntl (pfd->fd, F_GETFL);
+        ret = fcntl (pfd->fd, F_SETFL, (flags & (~O_DIRECT)));
         _fd = pfd->fd;
 
         ret = pread (_fd, buf, len, offset);


The problem happens because self-heal does rchecksum, readv, then writev in that order so before doing rchecksum necessary o-direct flag manipulation is not done, like the ones in posix-io readv/writev. So pread in rchecksum may give EINVAL if o-direct is set in the previous iteration of readv/writev.

While this confirms the theory, I don't think this is an elegant solution. I will have to discuss this with a few people before sending the patch.

Comment 10 Vijay Bellur 2012-11-20 08:31:49 UTC
CHANGE: http://review.gluster.org/4134 (storage/posix: Make rchecksum O_DIRECT friendly) merged in master by Vijay Bellur (vbellur)

Comment 11 spandura 2012-11-29 10:53:33 UTC
Verified the bug with glusterfs-3.3.0rhsvirt1-8.el6rhs.x86_64.


Note You need to log in before you can comment on or make changes to this bug.