Bug 981837 - Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is umounting aux_mount, worker dies and with new worker again same problem
Summary: Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is um...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Venky Shankar
QA Contact: amainkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-06 07:26 UTC by Rachana Patel
Modified: 2015-04-20 11:56 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.4.0.12rhs.beta6-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 22:29:51 UTC
Embargoed:


Attachments (Terms of Use)

Description Rachana Patel 2013-07-06 07:26:09 UTC
Description of problem:
 Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is umounting aux_mount, worker dies and with new worker again same problem

Version-Release number of selected component (if applicable):
3.4.0.12rhs.beta2-1.el6rhs.x86_64

How reproducible:


Steps to Reproduce:
1. master was DHT volume(1x6), slave was DHT volume(1x3)
2. created data on master volume more than 80GB.
3. created geo rep session between master and slave, 
Keep checking for more than 12 hours. session is in stable state, xsync prepares change log and after that before it sync a sing file or dir, worker process died, again new worker process and same steps.


from log
 ssh%3A%2F%2Froot%4010.70.43.147%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.%2Frhs%2Fbrick1%2Fm2.gluster.log

....

[2013-07-05 21:23:09.237905] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:09.237951] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-05 21:23:17.241236] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:17.241288] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-05 21:23:21.647976] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:21.648215] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-06 00:20:01.902038] I [fuse-bridge.c:5468:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-eo3siJ
[2013-07-06 00:20:01.968151] W [glusterfsd.c:1030:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3d9d0e890d] (-->/lib64/libpthr
ead.so.0() [0x3d9d807851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40528d]))) 0-: received signum (15), shutting down
[2013-07-06 00:20:01.969834] I [fuse-bridge.c:6144:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-eo3siJ'.
[2013-07-06 00:20:14.982494] I [glusterfsd.c:1938:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0.12rhs
.beta2 (/usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/master1/ssh%3A%2F%2Froot%4010.70.43.147%3Ag
luster%3A%2F%2F127.0.0.1%3Aslave1.%2Frhs%2Fbrick1%2Fm2.gluster.log --volfile-server=localhost --volfile-id=master1 --client-pid=-1 /tm
p/gsyncd-aux-mount-T7qe89)

....


each time  
prepare xsync change log
'mismatching layouts for /d1'
unmounting
worker died
new worker

Actual results:


Expected results:


Additional info:

Comment 3 Amar Tumballi 2013-07-06 17:32:39 UTC
Shishir, this looks similar to https://code.engineering.redhat.com/gerrit/9801 

But just have a look, and confirm if thats true.

Comment 4 Vijaykumar Koppad 2013-07-12 09:17:22 UTC
I was able to reproduce the issue in the build glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64. 

Master volume was dist-rep volume. 
Steps followed:
1. On master, kill one of the replica pair.
2. Start creating and deleting files (around 10000)  in loop.
3. After few iterations, bring back the brick. 
4. Remove all the files from mount point. It doesn't remove few directories. 


These were the logs I got in the client log file. 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2013-07-12 07:32:40.255481] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85 - disk layout missing
[2013-07-12 07:32:40.255902] I [dht-common.c:657:dht_revalidate_cbk] 0-master-dht: mismatching layouts for /level05/level15/level25/level35/level45/level55/level65/level75/level85
[2013-07-12 08:28:07.026409] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85 - disk layout missing
[2013-07-12 08:28:07.026495] I [dht-common.c:657:dht_revalidate_cbk] 0-master-dht: mismatching layouts for /level05/level15/level25/level35/level45/level55/level65/level75/level85
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Comment 6 Amar Tumballi 2013-07-23 10:15:59 UTC
https://code.engineering.redhat.com/gerrit/10518

Comment 7 Rachana Patel 2013-08-22 10:23:47 UTC
verified with -3.4.0.20rhs-2.el6rhs.x86_64

not able to reproduce, hence moving to verified

Comment 8 Scott Haines 2013-09-23 22:29:51 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.