Bug 981837

Summary: Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is umounting aux_mount, worker dies and with new worker again same problem
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: geo-replicationAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: amainkar
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: aavati, amarts, csaba, rhs-bugs, surs, vkoppad, vshankar
Target Milestone: ---Keywords: TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.12rhs.beta6-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 22:29:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2013-07-06 07:26:09 UTC
Description of problem:
 Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is umounting aux_mount, worker dies and with new worker again same problem

Version-Release number of selected component (if applicable):
3.4.0.12rhs.beta2-1.el6rhs.x86_64

How reproducible:


Steps to Reproduce:
1. master was DHT volume(1x6), slave was DHT volume(1x3)
2. created data on master volume more than 80GB.
3. created geo rep session between master and slave, 
Keep checking for more than 12 hours. session is in stable state, xsync prepares change log and after that before it sync a sing file or dir, worker process died, again new worker process and same steps.


from log
 ssh%3A%2F%2Froot%4010.70.43.147%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.%2Frhs%2Fbrick1%2Fm2.gluster.log

....

[2013-07-05 21:23:09.237905] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:09.237951] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-05 21:23:17.241236] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:17.241288] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-05 21:23:21.647976] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:21.648215] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-06 00:20:01.902038] I [fuse-bridge.c:5468:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-eo3siJ
[2013-07-06 00:20:01.968151] W [glusterfsd.c:1030:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3d9d0e890d] (-->/lib64/libpthr
ead.so.0() [0x3d9d807851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40528d]))) 0-: received signum (15), shutting down
[2013-07-06 00:20:01.969834] I [fuse-bridge.c:6144:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-eo3siJ'.
[2013-07-06 00:20:14.982494] I [glusterfsd.c:1938:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0.12rhs
.beta2 (/usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/master1/ssh%3A%2F%2Froot%4010.70.43.147%3Ag
luster%3A%2F%2F127.0.0.1%3Aslave1.%2Frhs%2Fbrick1%2Fm2.gluster.log --volfile-server=localhost --volfile-id=master1 --client-pid=-1 /tm
p/gsyncd-aux-mount-T7qe89)

....


each time  
prepare xsync change log
'mismatching layouts for /d1'
unmounting
worker died
new worker

Actual results:


Expected results:


Additional info:

Comment 3 Amar Tumballi 2013-07-06 17:32:39 UTC
Shishir, this looks similar to https://code.engineering.redhat.com/gerrit/9801 

But just have a look, and confirm if thats true.

Comment 4 Vijaykumar Koppad 2013-07-12 09:17:22 UTC
I was able to reproduce the issue in the build glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64. 

Master volume was dist-rep volume. 
Steps followed:
1. On master, kill one of the replica pair.
2. Start creating and deleting files (around 10000)  in loop.
3. After few iterations, bring back the brick. 
4. Remove all the files from mount point. It doesn't remove few directories. 


These were the logs I got in the client log file. 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2013-07-12 07:32:40.255481] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85 - disk layout missing
[2013-07-12 07:32:40.255902] I [dht-common.c:657:dht_revalidate_cbk] 0-master-dht: mismatching layouts for /level05/level15/level25/level35/level45/level55/level65/level75/level85
[2013-07-12 08:28:07.026409] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85 - disk layout missing
[2013-07-12 08:28:07.026495] I [dht-common.c:657:dht_revalidate_cbk] 0-master-dht: mismatching layouts for /level05/level15/level25/level35/level45/level55/level65/level75/level85
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Comment 6 Amar Tumballi 2013-07-23 10:15:59 UTC
https://code.engineering.redhat.com/gerrit/10518

Comment 7 Rachana Patel 2013-08-22 10:23:47 UTC
verified with -3.4.0.20rhs-2.el6rhs.x86_64

not able to reproduce, hence moving to verified

Comment 8 Scott Haines 2013-09-23 22:29:51 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html