Description of problem: geo-rep fails to do first time history crawl after the volume restored from snap. logs snippet from change.log in working_dir >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2014-06-12 07:08:33.017140] I [gf-changelog.c:186:gf_changelog_notification_init] 0-glusterfs: connecting to changelog socket: /var/run/gluster/changelog-764717cae0147062b6b692f6b5d13a72.sock (brick: /var/run/gluster/snaps/fa41455574c54d 53b708e3394d3bae2c/brick3/master_b3) [2014-06-12 07:08:33.017203] W [gf-changelog.c:196:gf_changelog_notification_init] 0-glusterfs: connection attempt 1/5... [2014-06-12 07:08:33.017277] I [gf-changelog.c:216:gf_changelog_notification_init] 0-glusterfs: connection successful [2014-06-12 07:08:44.070462] D [gf-changelog-process.c:597:gf_changelog_ext_change] 0-glusterfs: processing changelog: /var/run/gluster/snaps/fa41455574c54d53b708e3394d3bae2c/brick3/master_b3/.glusterfs/changelogs/CHANGELOG.1402556924 [2014-06-12 07:08:59.098946] D [gf-changelog-process.c:597:gf_changelog_ext_change] 0-glusterfs: processing changelog: /var/run/gluster/snaps/fa41455574c54d53b708e3394d3bae2c/brick3/master_b3/.glusterfs/changelogs/CHANGELOG.1402556939 [2014-06-12 07:09:14.152214] D [gf-changelog-process.c:597:gf_changelog_ext_change] 0-glusterfs: processing changelog: /var/run/gluster/snaps/fa41455574c54d53b708e3394d3bae2c/brick3/master_b3/.glusterfs/changelogs/CHANGELOG.1402556954 [2014-06-12 07:09:29.169742] D [gf-changelog-process.c:597:gf_changelog_ext_change] 0-glusterfs: processing changelog: /var/run/gluster/snaps/fa41455574c54d53b708e3394d3bae2c/brick3/master_b3/.glusterfs/changelogs/CHANGELOG.1402556969 [2014-06-12 07:09:44.177457] D [gf-changelog-process.c:597:gf_changelog_ext_change] 0-glusterfs: processing changelog: /var/run/gluster/snaps/fa41455574c54d53b708e3394d3bae2c/brick3/master_b3/.glusterfs/changelogs/CHANGELOG.1402556984 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which actually should be like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2014-06-12 08:58:37.878532] I [gf-changelog.c:186:gf_changelog_notification_init] 0-glusterfs: connecting to changelog socket: /var/run/gluster/changelog-764717cae0147062b6b692f6b5d13a72.sock (brick: /var/run/gluster/snaps/fa41455574c54d53b708e3394d3bae2c/brick3/master_b3) [2014-06-12 08:58:37.878598] W [gf-changelog.c:196:gf_changelog_notification_init] 0-glusterfs: connection attempt 1/5... [2014-06-12 08:58:37.878709] I [gf-changelog.c:216:gf_changelog_notification_init] 0-glusterfs: connection successful [2014-06-12 08:58:38.098239] I [gf-history-changelog.c:722:gf_changelog_extract_min_max] 0-glusterfs: MIN: 1402553166, MAX: 1402554141, TOTAL CHANGELOGS: 64 [2014-06-12 08:58:38.098341] I [gf-history-changelog.c:722:gf_changelog_extract_min_max] 0-glusterfs: MIN: 1402554151, MAX: 1402556735, TOTAL CHANGELOGS: 173 [2014-06-12 08:58:38.098381] I [gf-history-changelog.c:722:gf_changelog_extract_min_max] 0-glusterfs: MIN: 1402556756, MAX: 1402563505, TOTAL CHANGELOGS: 450 [2014-06-12 08:58:38.098476] I [gf-history-changelog.c:864:gf_history_changelog] 0-glusterfs: FINAL: from: 1402563460, to: 1402563505, changes: 4 [2014-06-12 08:58:38.099505] D [gf-history-changelog.c:286:gf_history_changelog_scan] 0-glusterfs: hist_done 1, is_last_scan: 0 [2014-06-12 08:58:39.099858] D [gf-history-changelog.c:286:gf_history_changelog_scan] 0-glusterfs: hist_done 0, is_last_scan: 1 [2014-06-12 08:58:40.475710] D [gf-changelog-process.c:597:gf_changelog_ext_change] 0-glusterfs: processing changelog: /var/run/gluster/snaps/fa41455574c54d53b708e3394d3bae2c/brick3/master_b3/.glusterfs/changelogs/CHANGELOG.1402563520 [2014-06-12 08:58:55.765644] D [gf-changelog-process.c:597:gf_changelog_ext_change] 0-glusterfs: processing changelog: /var/run/gluster/snaps/fa41455574c54d53b708e3394d3bae2c/brick3/master_b3/.glusterfs/changelogs/CHANGELOG.1402563535 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version-Release number of selected component (if applicable): glusterfs-3.6.0.15-1.el6rhs How reproducible: Didn't try to reproduce. Steps to Reproduce: 1. create and start a geo-rep relationship between master and slave. 2. create 10K files on master, "crefi -T 10 -n 10 --multi -b 10 -d 10 --random --min=1K --max=10K /mnt/master/" 3. let it complete the creation, and pause geo-rep before it sync all the data. 4. Take the snap-shot on slave first and then on master and resume the geo-rep. 5. create some more data on master and let it sync to slave. 6. stop geo-rep and also master and slave volumes. 7. restore snapped volume. 8. start master and slave volume and also start geo-rep rep. 9. check, if geo-rep used the histroy crawl to sync data. Actual results: geo-rep failed to do first history crawl after the restore of snaps. Expected results: geo-rep should do first history crawl after the restore of snaps. Additional info:
Please update the xattrs dump of bricks root(backend) before snap and after restore.
This is happening even in the build, glusterfs-3.6.0.24-1.el6rhs. xattrs of the snapped bricks before restoration. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: [root@redlake geo-automation]# getfattr -d -m . -e hex /var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick1/master_b1/ getfattr: Removing leading '/' from absolute path names # file: var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick1/master_b1/ trusted.afr.master-client-0=0x000000000000000000000000 trusted.afr.master-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000555555547ffffffd trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.d52240de-e128-4e1f-9ef3-1cab1d657a61.stime=0x53bd30eb00000000 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.xtime=0x53bd36780006bfa4 trusted.glusterfs.volume-id=0x7f96ee01b5f7483aa9e6c3feefce95c8 [root@redlake geo-automation]# getfattr -d -m . -e hex /var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick5/master_b5/ getfattr: Removing leading '/' from absolute path names # file: var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick5/master_b5/ trusted.afr.master-client-4=0x000000000000000000000000 trusted.afr.master-client-5=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaa8d5555551 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.d52240de-e128-4e1f-9ef3-1cab1d657a61.stime=0x53bd30ed00000000 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.xtime=0x53bd367800058eb5 trusted.glusterfs.volume-id=0x7f96ee01b5f7483aa9e6c3feefce95c8 [root@redlake geo-automation]# getfattr -d -m . -e hex /var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick9/master_b9/ getfattr: Removing leading '/' from absolute path names # file: var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick9/master_b9/ trusted.afr.master-client-8=0x000000000000000000000000 trusted.afr.master-client-9=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000002aaaaaa9 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.d52240de-e128-4e1f-9ef3-1cab1d657a61.stime=0x53bd30eb00000000 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.xtime=0x53bd367800053ebf trusted.glusterfs.volume-id=0x7f96ee01b5f7483aa9e6c3feefce95c8 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: It looks like huge difference between stime and xtime.
As requested earlier, please provide xattr dump after restore also.
xattr dump after restore of the same snap provided in the comment 3. [root@redlake ~]# getfattr -d -m . -e hex /var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick1/master_b1/ getfattr: Removing leading '/' from absolute path names # file: var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick1/master_b1/ trusted.afr.master-client-0=0x000000000000000000000000 trusted.afr.master-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000555555547ffffffd trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.d52240de-e128-4e1f-9ef3-1cab1d657a61.stime=0x53bd30eb00000000 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.xtime=0x53bd36780006bfa4 trusted.glusterfs.volume-id=0xe79069169db44aadbf52b744cfd52794 [root@redlake ~]# getfattr -d -m . -e hex /var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick5/master_b5/ getfattr: Removing leading '/' from absolute path names # file: var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick5/master_b5/ trusted.afr.master-client-4=0x000000000000000000000000 trusted.afr.master-client-5=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaa8d5555551 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.d52240de-e128-4e1f-9ef3-1cab1d657a61.stime=0x53bd30ed00000000 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.xtime=0x53bd367800058eb5 trusted.glusterfs.volume-id=0xe79069169db44aadbf52b744cfd52794 [root@redlake ~]# getfattr -d -m . -e hex /var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick9/master_b9/ getfattr: Removing leading '/' from absolute path names # file: var/run/gluster/snaps/7f96ee01b5f7483aa9e6c3feefce95c8/brick9/master_b9/ trusted.afr.master-client-8=0x000000000000000000000000 trusted.afr.master-client-9=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000002aaaaaa9 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.d52240de-e128-4e1f-9ef3-1cab1d657a61.stime=0x53bd30eb00000000 trusted.glusterfs.e7906916-9db4-4aad-bf52-b744cfd52794.xtime=0x53bd367800053ebf trusted.glusterfs.volume-id=0xe79069169db44aadbf52b744cfd52794
After much analysis and debugging, I figured out following things. Cause: On geo-rep start, when requested for stime, it is giving -1 because a check whether (stime < first geo-rep start time) fails and hence not going for history crawl, a complete xsync is being done. geo-rep start time is recorded, the very first time geo-rep session is started for that master between any of it slaves. geo-rep start time is fetched by doing getxattr on 'trusted.glusterfs.volume-mark' on master's gluster root. Analysis: Apart from 'trusted.glusterfs.volume-mark.<uuid>' xattr, 'trusted.glusterfs.volume-mark' xattr is also maintained virtually. Virtually in the sense, the xattr is not set on the gluster root, but when requested, marker gives it. 'trusted.glusterfs.volume-mark' holds the combination of 'node uuid', 'return value', 'geo-rep start time' and others as the value. On geo-rep start, marker creates 'marker.tstamp' file if indexing is turned on. On getxattr for the key 'trusted.glusterfs.volume-mark' on gluster root, marker stats 'marker.tstamp' file and modification time is returned along with others. On snapshot creation, brick vol files are freshly created and hence marker.tstamp file also. So marker.tstamp file records the snapshot creation time, instead of first geo-rep start time. On geo-rep start after master volume restoration, when requested for xtime, the comparison (xtime < first geo-rep start time) obviously fails and returns -1.
Upstream Patch: http://review.gluster.org/#/c/8401/ Downstream Patch: https://code.engineering.redhat.com/gerrit/#/c/30264/
verified on the build glusterfs-3.6.0.27-1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html