Description of problem: ========================== Invalid arguments error messages in self-heal daemon log file when "gluster volume heal <volume_name> " command is executed. [2012-10-15 17:31:12.417749] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-1: remote operation failed: Invalid argument [2012-10-15 17:31:12.417794] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:a62b08a4-c5f4-4245-a1c2-6645ecdb3093> failed on subvolume replicate-client-1 (Invalid argument) Version-Release number of selected component (if applicable): ============================================================== [10/15/12 - 17:41:05 root@rhs-client6 ~]# rpm -qa | grep gluster glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64 vdsm-gluster-4.9.6-14.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64 gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch [10/15/12 - 17:41:10 root@rhs-client6 ~]# gluster --version glusterfs 3.3.0rhsvirt1 built on Oct 8 2012 15:23:00 Steps to Reproduce: ==================== 1.Create a pure replicate volume (1x2) with 2 servers and 1 brick on each server. This is the storage for the VM's. start the volume. 2.Set-up the KVM to use the volume as VM store. 3.Create 2 virtual machines (vm1 and vm2) 4.power off server1 (one of the server from each replicate pair) 5.delete the virtual machines vm2. 6.perform the following operations when server1 is down: a. rhn_register vm1, b. yum update on vm1 c. create new virtual machine vm3 d. start the virtual machine vm3 f. rhn_register vm3 g. create new virtual machine vm4 7. power on server1. 8. Perform IO on the VM's while self-heal is in progress. 9. After some time poweroff server1. and poweron it immediately. 10. execute "gluster volume heal <volume_name>" Actual results: ----------------- [2012-10-15 17:26:06.705237] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-0: remote operation failed: Invalid argument [2012-10-15 17:26:06.705261] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:1f087927-a5e4-42d9-88c6-bdaf3d06da22> failed on subvolume replicate-client-0 (Invalid argument) [10/15/12 - 17:26:32 root@rhs-client7 ~]# stat /disk0/.glusterfs/1f/08/1f087927-a5e4-42d9-88c6-bdaf3d06da22 File: `/disk0/.glusterfs/1f/08/1f087927-a5e4-42d9-88c6-bdaf3d06da22' Size: 2097152 Blocks: 2056 IO Block: 4096 regular file Device: fd04h/64772d Inode: 1074765955 Links: 2 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Access: 2012-10-15 17:26:06.627574826 +0530 Modify: 2012-10-15 17:22:27.228389724 +0530 Change: 2012-10-15 17:26:06.626370057 +0530 [10/15/12 - 17:26:49 root@rhs-client7 ~]# find /disk0 -inum 1074765955 -print /disk0/.glusterfs/1f/08/1f087927-a5e4-42d9-88c6-bdaf3d06da22 /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/dom_md/leases [10/15/12 - 17:27:00 root@rhs-client7 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/dom_md/leases getfattr: Removing leading '/' from absolute path names # file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/dom_md/leases trusted.afr.replicate-client-0=0x000000060000000000000000 trusted.afr.replicate-client-1=0x000000000000000000000000 trusted.gfid=0x1f087927a5e442d988c6bdaf3d06da22 ############################################################################### [2012-10-15 17:26:07.099239] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-1: remote operation failed: Invalid argument [2012-10-15 17:26:07.099283] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:a62b08a4-c5f4-4245-a1c2-6645ecdb3093> failed on subvolume replicate-client-1 (Invalid argument) [10/15/12 - 17:27:43 root@rhs-client7 ~]# stat -c %i /disk0/.glusterfs/a6/2b/a62b08a4-c5f4-4245-a1c2-6645ecdb3093 1610612915 [10/15/12 - 17:27:59 root@rhs-client7 ~]# find /disk0/ -inum 1610612915 -print /disk0/.glusterfs/a6/2b/a62b08a4-c5f4-4245-a1c2-6645ecdb3093 /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/1dc42d43-6e19-40f9-979c-db2fc7820d8c/6bbcbb32-cc52-4e33-9a1a-343026cfe7b7 [10/15/12 - 17:28:12 root@rhs-client7 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/1dc42d43-6e19-40f9-979c-db2fc7820d8c/6bbcbb32-cc52-4e33-9a1a-343026cfe7b7 getfattr: Removing leading '/' from absolute path names # file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/1dc42d43-6e19-40f9-979c-db2fc7820d8c/6bbcbb32-cc52-4e33-9a1a-343026cfe7b7 trusted.afr.replicate-client-0=0x0000649f0000000000000000 trusted.afr.replicate-client-1=0x000000000000000000000000 trusted.gfid=0xa62b08a4c5f44245a1c26645ecdb3093 ############################################################################### [2012-10-15 17:26:07.173198] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-0: remote operation failed: Invalid argument [2012-10-15 17:26:07.173232] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:c8587f1f-7ff7-42b8-a001-d269ccf10611> failed on subvolume replicate-client-0 (Invalid argument) [10/15/12 - 17:28:20 root@rhs-client7 ~]# stat -c %i /disk0/.glusterfs/c8/58/c8587f1f-7ff7-42b8-a001-d269ccf10611 536871134 [10/15/12 - 17:28:43 root@rhs-client7 ~]# find /disk0/ -inum 536871134 -print /disk0/.glusterfs/c8/58/c8587f1f-7ff7-42b8-a001-d269ccf10611 /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/80e6058e-48be-4fad-a886-1c7f97ff6ba0/fc29bef5-8024-43b8-a8b5-abc8b13a8e49 [10/15/12 - 17:28:53 root@rhs-client7 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/80e6058e-48be-4fad-a886-1c7f97ff6ba0/fc29bef5-8024-43b8-a8b5-abc8b13a8e49 getfattr: Removing leading '/' from absolute path names # file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/80e6058e-48be-4fad-a886-1c7f97ff6ba0/fc29bef5-8024-43b8-a8b5-abc8b13a8e49 trusted.afr.replicate-client-0=0x00044fc70000000000000000 trusted.afr.replicate-client-1=0x000000000000000000000000 trusted.gfid=0xc8587f1f7ff742b8a001d269ccf10611 Additional info: ==================== [10/15/12 - 17:49:23 root@rhs-client6 ~]# gluster volume info replicate Volume Name: replicate Type: Replicate Volume ID: d24fc5c1-74a7-46de-99ff-f7d827408993 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: rhs-client6.lab.eng.blr.redhat.com:/disk0 Brick2: rhs-client7.lab.eng.blr.redhat.com:/disk0 Options Reconfigured: storage.linux-aio: enable cluster.eager-lock: on performance.read-ahead: disable performance.stat-prefetch: disable performance.io-cache: disable performance.quick-read: disable
[2012-10-15 17:51:12.813975] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:c8587f1f-7ff7-42b8-a001-d269ccf10611> failed on subvolume replicate-client-0 (Invalid argument) [2012-10-15 17:51:12.918598] W [client3_1-fops.c:2141:client3_1_rchecksum_cbk] 0-replicate-client-1: remote operation failed: Invalid argument [2012-10-15 17:51:12.918639] E [afr-self-heal-algorithm.c:609:sh_diff_checksum_cbk] 0-replicate-replicate-0: checksum on <gfid:91c4970e-f1d5-42bf-83d4-cae8766b061e> failed on subvolume replicate-client-1 (Invalid argument) [10/15/12 - 17:54:46 root@rhs-client7 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/f6f2f7d5-a272-47a4-b343-9bfbb65a5b05/9a4e65b2-66a9-43a2-bcb7-4ea7100604d4 getfattr: Removing leading '/' from absolute path names # file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/f6f2f7d5-a272-47a4-b343-9bfbb65a5b05/9a4e65b2-66a9-43a2-bcb7-4ea7100604d4 trusted.afr.replicate-client-0=0x0004093e0000000000000000 trusted.afr.replicate-client-1=0x000000000000000000000000 trusted.gfid=0x91c4970ef1d542bf83d4cae8766b061e [10/15/12 - 17:55:50 root@rhs-client6 ~]# getfattr -d -e hex -m . /disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/f6f2f7d5-a272-47a4-b343-9bfbb65a5b05/9a4e65b2-66a9-43a2-bcb7-4ea7100604d4 getfattr: Removing leading '/' from absolute path names # file: disk0/59615823-6bad-406c-b9f5-f02cfc8d6103/images/f6f2f7d5-a272-47a4-b343-9bfbb65a5b05/9a4e65b2-66a9-43a2-bcb7-4ea7100604d4 trusted.afr.replicate-client-0=0x000000010000000000000000 trusted.afr.replicate-client-1=0x000000010000000000000000 trusted.gfid=0x91c4970ef1d542bf83d4cae8766b061e
does it depend on 827334 ?
Amar, It should not. The issue does not happen always, most probably remote_fd must have gone bad and server must be the xl that is sending this EINVAL. I am waiting for the logs to confirm that. Pranith
The invalid argument error is stemming from pread of posix_rchecksum. [2012-10-15 17:49:10.048155] W [posix.c:3901:posix_rchecksum] 0-replicate-posix: pread of 131072 bytes returned -1 (Invalid argument) [2012-10-15 17:49:10.048236] I [server3_1-fops.c:1498:server_rchecksum_cbk] 0-replicate-server: 338279: RCHECKSUM 3 (c8587f1f-7ff7-42b8-a001-d269ccf10611)==> -1 (Invalid argument) One of the reasons pread can give EINVAL is if the offset is -ve. But it is not the case here. So I need to figure out what else could be the reason.
Breakpoint 3, posix_rchecksum (frame=0x7fd08a7c5cb8, this=0x10e3120, fd=0x1103a10, offset=<value optimized out>, len=131072, xdata=<value optimized out>) at posix.c:3899 3899 gf_log (this->name, GF_LOG_WARNING, Missing separate debuginfos, use: debuginfo-install libgcc-4.4.6-3.el6.x86_64 (gdb) bt #0 posix_rchecksum (frame=0x7fd08a7c5cb8, this=0x10e3120, fd=0x1103a10, offset=<value optimized out>, len=131072, xdata=<value optimized out>) at posix.c:3899 #1 0x00000037a5a1bb9c in default_rchecksum (frame=<value optimized out>, this=0x10e4940, fd=0x1103a10, offset=131072, len=131072, xdata=0x0) at defaults.c:1182 #2 0x00000037a5a1bb9c in default_rchecksum (frame=<value optimized out>, this=0x10e5a40, fd=0x1103a10, offset=131072, len=131072, xdata=0x0) at defaults.c:1182 #3 0x00007fd086bbe15c in iot_rchecksum_wrapper (frame=<value optimized out>, this=0x10e6bc0, fd=0x1103a10, offset=131072, len=131072, xdata=0x0) at io-threads.c:2315 #4 0x00000037a5a2ed3c in call_resume_wind (stub=0x7fd08a48068c) at call-stub.c:2698 #5 call_resume (stub=0x7fd08a48068c) at call-stub.c:4151 #6 0x00007fd086bc6f1b in iot_worker (data=0x10f8de0) at io-threads.c:131 #7 0x000000322da077f1 in start_thread () from /lib64/libpthread.so.0 #8 0x000000322d2e5ccd in clone () from /lib64/libc.so.6 (gdb) info registers rax 0xffffffffffffffff -1 rbx 0x7fd08a7c5cb8 140533653331128 rcx 0xffffffffffffffa8 -88 rdx 0x16 22 rsi 0x7fd068002910 140533074766096 rdi 0x0 0 rbp 0x10e3120 0x10e3120 rsp 0x7fd08413ac40 0x7fd08413ac40 r8 0xc 12 r9 0x9 9 r10 0x20000 131072 r11 0x0 0 r12 0x7fd068002910 140533074766096 r13 0x1103a10 17840656 r14 0xffffffff 4294967295 r15 0x20000 131072 rip 0x7fd08740aee8 0x7fd08740aee8 <posix_rchecksum+632> eflags 0x286 [ PF SF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0
Thanks to the tip from Jeff Darcy. Problem seems to be because of O_DIRECT and posix-aio. Here is the simpler test case that re-creates the problem. Steps: 1) same volume configuration as above. 2) dd if=/dev/urandom of=a bs=1M count=1 3) kill one of the bricks in the replica pair 4) dd if=/dev/urandom of=a bs=1M count=1 5) bring back the brick. Self-heal will keep failing saying EINVAL in rchecksum. I made the following change in posix_rchecksum before pread and it works fine. pranithk @ pranithk-laptop ~/workspace/rhs-glusterfs 19:34:18 :) ⚡ git diff diff --git a/xlators/storage/posix/src/posix.c b/xlators/storage/posix/src/posix.c index 5260ff3..cf2b5fb 100644 --- a/xlators/storage/posix/src/posix.c +++ b/xlators/storage/posix/src/posix.c @@ -3871,6 +3871,7 @@ posix_rchecksum (call_frame_t *frame, xlator_t *this, int ret = 0; int32_t weak_checksum = 0; unsigned char strong_checksum[MD5_DIGEST_LENGTH]; + int flags = 0; VALIDATE_OR_GOTO (frame, out); VALIDATE_OR_GOTO (this, out); @@ -3892,6 +3893,8 @@ posix_rchecksum (call_frame_t *frame, xlator_t *this, goto out; } + flags = fcntl (pfd->fd, F_GETFL); + ret = fcntl (pfd->fd, F_SETFL, (flags & (~O_DIRECT))); _fd = pfd->fd; ret = pread (_fd, buf, len, offset); The problem happens because self-heal does rchecksum, readv, then writev in that order so before doing rchecksum necessary o-direct flag manipulation is not done, like the ones in posix-io readv/writev. So pread in rchecksum may give EINVAL if o-direct is set in the previous iteration of readv/writev. While this confirms the theory, I don't think this is an elegant solution. I will have to discuss this with a few people before sending the patch.
CHANGE: http://review.gluster.org/4134 (storage/posix: Make rchecksum O_DIRECT friendly) merged in master by Vijay Bellur (vbellur)
Verified the bug with glusterfs-3.3.0rhsvirt1-8.el6rhs.x86_64.