Description of problem: Dist-geo-rep: fail to sync files, log says- 'disk layout missing' so it is umounting aux_mount, worker dies and with new worker again same problem Version-Release number of selected component (if applicable): 3.4.0.12rhs.beta2-1.el6rhs.x86_64 How reproducible: Steps to Reproduce: 1. master was DHT volume(1x6), slave was DHT volume(1x3) 2. created data on master volume more than 80GB. 3. created geo rep session between master and slave, Keep checking for more than 12 hours. session is in stable state, xsync prepares change log and after that before it sync a sing file or dir, worker process died, again new worker process and same steps. from log ssh%3A%2F%2Froot%4010.70.43.147%3Agluster%3A%2F%2F127.0.0.1%3Aslave1.%2Frhs%2Fbrick1%2Fm2.gluster.log .... [2013-07-05 21:23:09.237905] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing [2013-07-05 21:23:09.237951] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1 [2013-07-05 21:23:17.241236] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing [2013-07-05 21:23:17.241288] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1 [2013-07-05 21:23:21.647976] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master1-dht: /d1 - disk layout missing [2013-07-05 21:23:21.648215] I [dht-common.c:650:dht_revalidate_cbk] 0-master1-dht: mismatching layouts for /d1 [2013-07-06 00:20:01.902038] I [fuse-bridge.c:5468:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-eo3siJ [2013-07-06 00:20:01.968151] W [glusterfsd.c:1030:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3d9d0e890d] (-->/lib64/libpthr ead.so.0() [0x3d9d807851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40528d]))) 0-: received signum (15), shutting down [2013-07-06 00:20:01.969834] I [fuse-bridge.c:6144:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-eo3siJ'. [2013-07-06 00:20:14.982494] I [glusterfsd.c:1938:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0.12rhs .beta2 (/usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/master1/ssh%3A%2F%2Froot%4010.70.43.147%3Ag luster%3A%2F%2F127.0.0.1%3Aslave1.%2Frhs%2Fbrick1%2Fm2.gluster.log --volfile-server=localhost --volfile-id=master1 --client-pid=-1 /tm p/gsyncd-aux-mount-T7qe89) .... each time prepare xsync change log 'mismatching layouts for /d1' unmounting worker died new worker Actual results: Expected results: Additional info:
Shishir, this looks similar to https://code.engineering.redhat.com/gerrit/9801 But just have a look, and confirm if thats true.
I was able to reproduce the issue in the build glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64. Master volume was dist-rep volume. Steps followed: 1. On master, kill one of the replica pair. 2. Start creating and deleting files (around 10000) in loop. 3. After few iterations, bring back the brick. 4. Remove all the files from mount point. It doesn't remove few directories. These were the logs I got in the client log file. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [2013-07-12 07:32:40.255481] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85 - disk layout missing [2013-07-12 07:32:40.255902] I [dht-common.c:657:dht_revalidate_cbk] 0-master-dht: mismatching layouts for /level05/level15/level25/level35/level45/level55/level65/level75/level85 [2013-07-12 08:28:07.026409] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-master-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85 - disk layout missing [2013-07-12 08:28:07.026495] I [dht-common.c:657:dht_revalidate_cbk] 0-master-dht: mismatching layouts for /level05/level15/level25/level35/level45/level55/level65/level75/level85 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
https://code.engineering.redhat.com/gerrit/10518
verified with -3.4.0.20rhs-2.el6rhs.x86_64 not able to reproduce, hence moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html