Bug 981837 - Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is umounting aux_mount, worker dies and with new worker again same problem
Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is um...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Venky Shankar
amainkar
: TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-06 03:26 EDT by Rachana Patel
Modified: 2015-04-20 07:56 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.12rhs.beta6-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:29:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rachana Patel 2013-07-06 03:26:09 EDT
Description of problem:
 Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is umounting aux_mount, worker dies and with new worker again same problem

Version-Release number of selected component (if applicable):
3.4.0.12rhs.beta2-1.el6rhs.x86_64

How reproducible:


Steps to Reproduce:
1. master was DHT volume(1x6), slave was DHT volume(1x3)
2. created data on master volume more than 80GB.
3. created geo rep session between master and slave, 
Keep checking for more than 12 hours. session is in stable state, xsync prepares change log and after that before it sync a sing file or dir, worker process died, again new worker process and same steps.


from log
 ssh%3A%2F%2Froot%4010.70.43.147%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.%2Frhs%2Fbrick1%2Fm2.gluster.log

....

[2013-07-05 21:23:09.237905] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:09.237951] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-05 21:23:17.241236] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:17.241288] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-05 21:23:21.647976] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing
[2013-07-05 21:23:21.648215] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1
[2013-07-06 00:20:01.902038] I [fuse-bridge.c:5468:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-eo3siJ
[2013-07-06 00:20:01.968151] W [glusterfsd.c:1030:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3d9d0e890d] (-->/lib64/libpthr
ead.so.0() [0x3d9d807851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40528d]))) 0-: received signum (15), shutting down
[2013-07-06 00:20:01.969834] I [fuse-bridge.c:6144:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-eo3siJ'.
[2013-07-06 00:20:14.982494] I [glusterfsd.c:1938:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0.12rhs
.beta2 (/usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/master1/ssh%3A%2F%2Froot%4010.70.43.147%3Ag
luster%3A%2F%2F127.0.0.1%3Aslave1.%2Frhs%2Fbrick1%2Fm2.gluster.log --volfile-server=localhost --volfile-id=master1 --client-pid=-1 /tm
p/gsyncd-aux-mount-T7qe89)

....


each time  
prepare xsync change log
'mismatching layouts for /d1'
unmounting
worker died
new worker

Actual results:


Expected results:


Additional info:
Comment 3 Amar Tumballi 2013-07-06 13:32:39 EDT
Shishir, this looks similar to https://code.engineering.redhat.com/gerrit/9801 

But just have a look, and confirm if thats true.
Comment 4 Vijaykumar Koppad 2013-07-12 05:17:22 EDT
I was able to reproduce the issue in the build glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64. 

Master volume was dist-rep volume. 
Steps followed:
1. On master, kill one of the replica pair.
2. Start creating and deleting files (around 10000)  in loop.
3. After few iterations, bring back the brick. 
4. Remove all the files from mount point. It doesn't remove few directories. 


These were the logs I got in the client log file. 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2013-07-12 07:32:40.255481] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85 - disk layout missing
[2013-07-12 07:32:40.255902] I [dht-common.c:657:dht_revalidate_cbk] 0-master-dht: mismatching layouts for /level05/level15/level25/level35/level45/level55/level65/level75/level85
[2013-07-12 08:28:07.026409] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85 - disk layout missing
[2013-07-12 08:28:07.026495] I [dht-common.c:657:dht_revalidate_cbk] 0-master-dht: mismatching layouts for /level05/level15/level25/level35/level45/level55/level65/level75/level85
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Comment 6 Amar Tumballi 2013-07-23 06:15:59 EDT
https://code.engineering.redhat.com/gerrit/10518
Comment 7 Rachana Patel 2013-08-22 06:23:47 EDT
verified with -3.4.0.20rhs-2.el6rhs.x86_64

not able to reproduce, hence moving to verified
Comment 8 Scott Haines 2013-09-23 18:29:51 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.