Bug 765516 (GLUSTER-3784)

Summary: [Red Hat SSA-3.2.4] 'untar'ing failed when rebalance is started.
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: access-controlAssignee: shishir gowda <sgowda>
Status: CLOSED CURRENTRELEASE QA Contact: M S Vishwanath Bhat <vbhat>
Severity: medium Docs Contact:
Priority: medium    
Version: pre-releaseCC: gluster-bugs, mzywusko, nsathyan, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:11:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    

Description M S Vishwanath Bhat 2011-11-07 04:12:03 UTC
Unable to upload the file, I have archived it...

Comment 1 M S Vishwanath Bhat 2011-11-07 07:09:39 UTC
Created a pure replicate volume with rdma transport type. Mounted via fuse and started untarring of Linux kernel. Now added two more bricks so that it became distributed-replicate volume. 

[root@client1 vishwa]# gluster volume info

Volume Name: hosdu
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: rdma
Bricks:
Brick1: 10.1.10.21:/home/brick
Brick2: 10.1.10.24:/home/brick
Brick3: 10.1.10.21:/home/brick-added
Brick4: 10.1.10.24:/brick-added
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on


Now I start rebalance and tough rebalance went on to completion successfully, untarring on the mounpoint failed with following message.

linux-2.6.39.4/virt/kvm/irq_comm.c
linux-2.6.39.4/virt/kvm/kvm_main.c
tar: linux-2.6.39.4/arch/microblaze/boot/dts: Directory renamed before its status could be extracted
tar: linux-2.6.39.4/arch/microblaze/boot: Directory renamed before its status could be extracted
tar: linux-2.6.39.4/arch/microblaze: Directory renamed before its status could be extracted
tar: linux-2.6.39.4/arch: Directory renamed before its status could be extracted
tar: linux-2.6.39.4: Directory renamed before its status could be extracted
tar: Exiting with failure status due to previous errors


I see lot of mismatch layout messages in client log.


[2011-11-06 23:38:50.232864] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:50.233064] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646
[2011-11-06 23:38:50.233102] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:51.236372] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-1; inode layout - 0 - 0; disk layout - 2147483647 - 4294967295
[2011-11-06 23:38:51.236411] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:51.236608] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646
[2011-11-06 23:38:51.236649] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:52.240227] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-1; inode layout - 0 - 0; disk layout - 2147483647 - 4294967295
[2011-11-06 23:38:52.240272] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:52.240567] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646
[2011-11-06 23:38:52.240604] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:53.245176] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-1; inode layout - 0 - 0; disk layout - 2147483647 - 4294967295
[2011-11-06 23:38:53.245223] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:53.245427] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646
[2011-11-06 23:38:53.245467] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:54.249253] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-1; inode layout - 0 - 0; disk layout - 2147483647 - 4294967295
[2011-11-06 23:38:54.249300] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:54.249483] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646
[2011-11-06 23:38:54.249505] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:55.254771] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-1; inode layout - 0 - 0; disk layout - 2147483647 - 4294967295
[2011-11-06 23:38:55.254814] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:55.255033] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646
[2011-11-06 23:38:55.255062] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:56.258631] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-1; inode layout - 0 - 0; disk layout - 2147483647 - 4294967295
[2011-11-06 23:38:56.258675] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:56.258870] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646
[2011-11-06 23:38:56.258890] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:57.268218] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-1; inode layout - 0 - 0; disk layout - 2147483647 - 4294967295
[2011-11-06 23:38:57.268258] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4
[2011-11-06 23:38:57.268508] I [dht-layout.c:682:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-replicate-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646
[2011-11-06 23:38:57.268547] I [dht-common.c:523:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-2.6.39.4


I have attached the client log. And the machines are Red Hat SSA which is running glusterfs-3.2.4.

Comment 2 Amar Tumballi 2011-11-08 01:36:01 UTC
I guess the patch in master to fix the similar issue was this.

-----
commit 6b02f2ac6a3889af0b0e1cdb4402352379b37539
Author: Amar Tumballi <amar>
Date:   Thu Apr 21 03:43:20 2011 +0000

    cluster/distribute: corrected layout mismatch handling logic
    
    Signed-off-by: Amar Tumballi <amar>
    Signed-off-by: Anand Avati <avati>
    
    BUG: 2281 (I/O operations exit when add-brick is done)
    URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2281

-----

As it would not apply now directly, can you test with the patch http://review.gluster.com/613 applied on top of source you have? If the issue is not reproducible with it, we should consider reviewing and committing the patch.

Comment 3 Amar Tumballi 2012-02-22 04:00:31 UTC
Not sure if we are going to fix it in 3.2.x branch. Please make sure this issue is not present in master. Shylesh/MS, can you confirm the behavior doesn't happen on master (3.3.0qa23 onwards)

Comment 4 shishir gowda 2012-02-24 06:32:20 UTC
I was not able to reproduce this issue on 3.3.0qa24 (tcp connection and not rdma).
Please reopen the bug if this is not fixed on rdma.

Comment 5 M S Vishwanath Bhat 2012-06-01 14:00:47 UTC
In release-3.3 branch, Couldn't verify with 2 node replicate changed to 2*2 distributed-replicated as there is a known issue with ongoing i/o during volume type change.

But verified with the 2 node dist changed to 4 node dist volume. So moving to verified.