Description of problem: snapshot with geo-rep, while creating files on master, resulted in failure to capture few files entry in changelogs. In one of the case, active replica doesn't have entry for the missing file but passive replica has. file in question and its gfid >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> #getfattr -n glusterfs.gfid.string /mnt/master/thread0/level05/level15/539ab42c%%MFQLJBDHI3 getfattr: Removing leading '/' from absolute path names # file: mnt/master/thread0/level05/level15/539ab42c%%MFQLJBDHI3 glusterfs.gfid.string="cc9ccc81-9af6-4ca4-8e21-4226f08543cf" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on active replica >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # find /bricks/ | grep "539ab42c%%MFQLJBDHI3" /bricks/brick2/master_b7/thread0/level05/level15/539ab42c%%MFQLJBDHI3 [root@redcell ~]# grep "MFQLJBDHI3" /bricks/brick brick0/ brick1/ brick2/ brick3/ [root@redcell ~]# grep "MFQLJBDHI3" /bricks/brick2/master_b7/.glusterfs/changelogs/* [root@redcell ~]# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on passive replica >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # find /bricks/ | grep "539ab42c%%MFQLJBDHI3" /bricks/brick2/master_b8/thread0/level05/level15/539ab42c%%MFQLJBDHI3 [root@redeye ~]# grep "MFQLJBDHI3" /bricks/brick2/master_b8/.glusterfs/changelogs/* Binary file /bricks/brick2/master_b8/.glusterfs/changelogs/CHANGELOG.1402647607 matches >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Explanation of the debugging explained in the additional info. Version-Release number of selected component (if applicable): glusterfs-3.6.0.16-1.el6rhs How reproducible: Doesn't happen everytime. Steps to Reproduce: 1. create and start a geo-rep relationship between master and slave. 2. start creating data on master using the command "crefi -T 10 -n 10 --multi -d 10 -b 10 --random --max=10K --min=1K /mnt/master" 3. while creating data pause geo-rep 4. create snap-shot of slave, 5. create snap-shot of master 6. resume geo-rep Actual results: Few of the file are failed to get captured in changelog. Expected results: None of the files should be missed in changelog. Additional info: With bit of debugging, we found out some disconnection logs in brick log-file of active brick, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2014-06-13 08:16:35.831297] I [server-handshake.c:578:server_setvolume] 0-master-server: accepted client from redlake.blr.redhat.com-5110-2014/06/13-08:16:33:694562-master-client-6-0-0 (version: 3.6.0.16) [2014-06-13 08:16:35.834937] I [client_t.c:184:gf_client_get] 0-master-server: client_uid=redlake.blr.redhat.com-5110-2014/06/13-08:16:33:694562-master-client-6-0-0 [2014-06-13 08:16:36.122688] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2014-06-13 08:16:37.296077] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redeye.blr.redhat.com-6963-2014/06/13-07:33:48:600208-master-client-6-0-0 [2014-06-13 08:16:37.296128] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redeye.blr.redhat.com-6963-2014/06/13-07:33:48:600208-master-client-6-0-0 [2014-06-13 08:16:37.619666] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redcloak.blr.redhat.com-25396-2014/06/13-07:33:48:839150-master-client-6-0-0 [2014-06-13 08:16:37.619717] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redcloak.blr.redhat.com-25396-2014/06/13-07:33:48:839150-master-client-6-0-0 [2014-06-13 08:16:37.669722] I [graph.c:269:gf_add_cmdline_options] 0-master-server: adding option 'listen-port' for volume 'master-server' with value '49156' [2014-06-13 08:16:37.669785] I [graph.c:269:gf_add_cmdline_options] 0-master-posix: adding option 'glusterd-uuid' for volume 'master-posix' with value '86f23681-6c25-4b40-9b60-392266d23c8c' [2014-06-13 08:16:37.671746] E [changelog-helpers.c:169:htime_update] 0-master-changelog: Htime xattr updation failed [2014-06-13 08:16:37.678072] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redcell.blr.redhat.com-1723-2014/06/13-07:33:48:176326-master-client-6-0-0 [2014-06-13 08:16:37.678116] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redcell.blr.redhat.com-1723-2014/06/13-07:33:48:176326-master-client-6-0-0 [2014-06-13 08:16:38.301501] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redeye.blr.redhat.com-6965-2014/06/13-07:33:48:619927-master-client-6-0-0 [2014-06-13 08:16:38.301601] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redeye.blr.redhat.com-6965-2014/06/13-07:33:48:619927-master-client-6-0-0 [2014-06-13 08:16:38.639297] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redcloak.blr.redhat.com-25398-2014/06/13-07:33:48:865533-master-client-6-0-0 [2014-06-13 08:16:38.639376] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redcloak.blr.redhat.com-25398-2014/06/13-07:33:48:865533-master-client-6-0-0 [2014-06-13 08:16:38.689713] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redcell.blr.redhat.com-1725-2014/06/13-07:33:48:189276-master-client-6-0-0 [2014-06-13 08:16:38.689763] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redcell.blr.redhat.com-1725-2014/06/13-07:33:48:189276-master-client-6-0-0 [2014-06-13 08:16:40.010852] I [client_t.c:184:gf_client_get] 0-master-server: client_uid=redlake.blr.redhat.com-5239-2014/06/13-08:16:38:912527-master-client-6-0-0 [2014-06-13 08:16:40.010968] I [server-handshake.c:578:server_setvolume] 0-master-server: accepted client from redlake.blr.redhat.com-5239-2014/06/13-08:16:38:912527-master-client-6-0-0 (version: 3.6.0.16) [2014-06-13 08:16:40.011506] I [client_t.c:184:gf_client_get] 0-master-server: client_uid=redeye.blr.redhat.com-7394-2014/06/13-08:16:38:251715-master-client-6-0-0 [2014-06-13 08:16:40.011591] I [server-handshake.c:578:server_setvolume] 0-master-server: accepted client from redeye.blr.redhat.com-7394-2014/06/13-08:16:38:251715-master-client-6-0-0 (version: 3.6.0.16) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the theory is, because of these disconnections, few of the files were not written to the active brick, but it got those files from self of passive brick, which was not captured in changelog. This is just a theory, and fix for capturing the self-heal in changelog is below. --- xlators/features/changelog/src/changelog-helpers.h | 7 ++++--- xlators/features/changelog/src/changelog.c | 6 +++--- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/xlators/features/changelog/src/changelog-helpers.h b/xlators/features/changelog/src/changelog-helpers.h index 16d60b9..78a2db0 100644 --- a/xlators/features/changelog/src/changelog-helpers.h +++ b/xlators/features/changelog/src/changelog-helpers.h @@ -392,9 +392,10 @@ changelog_forget (xlator_t *this, inode_t *inode); } while (0) /* ignore internal fops */ -#define CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO(dict, label) do { \ - if (dict && dict_get (dict, GLUSTERFS_INTERNAL_FOP_KEY)) \ - goto label; \ +#define CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO(frame, dict, label) do { \ + if ((frame->root->pid > 0) && \ + dict && dict_get (dict, GLUSTERFS_INTERNAL_FOP_KEY)) \ + goto label; \ } while (0) #define CHANGELOG_COND_GOTO(priv, cond, label) do { \ diff --git a/xlators/features/changelog/src/changelog.c b/xlators/features/changelog/src/changelog.c index 4d83d73..360f511 100644 --- a/xlators/features/changelog/src/changelog.c +++ b/xlators/features/changelog/src/changelog.c @@ -147,7 +147,7 @@ changelog_unlink (call_frame_t *frame, xlator_t *this, priv = this->private; CHANGELOG_NOT_ACTIVE_THEN_GOTO (frame, priv, wind); - CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (xdata, wind); + CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (frame, xdata, wind); CHANGELOG_INIT_NOCHECK (this, frame->local, NULL, loc->inode->gfid, 2); @@ -273,7 +273,7 @@ changelog_link (call_frame_t *frame, priv = this->private; CHANGELOG_NOT_ACTIVE_THEN_GOTO (frame, priv, wind); - CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (xdata, wind); + CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (frame, xdata, wind); CHANGELOG_INIT_NOCHECK (this, frame->local, NULL, oldloc->gfid, 2); @@ -481,7 +481,7 @@ changelog_mknod (call_frame_t *frame, priv = this->private; CHANGELOG_NOT_ACTIVE_THEN_GOTO (frame, priv, wind); - CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (xdata, wind); + CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (frame, xdata, wind); ret = dict_get_ptr (xdata, "gfid-req", &uuid_req); if (ret) { -- 1.9.3
This has happened again while creating hardlinks on master and snapshot in parallel. file in question, is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # getfattr -n glusterfs.gfid.string /mnt/master/thread0/level05/level15/level25/level35/level45/level55/level65/level75/level85/hardlink_to_files/539c2000%%1D5NU2HMHG getfattr: Removing leading '/' from absolute path names # file: mnt/master/thread0/level05/level15/level25/level35/level45/level55/level65/level75/level85/hardlink_to_files/539c2000%%1D5NU2HMHG glusterfs.gfid.string="308fbe2e-86de-4d41-b517-2ffc834e5f66" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changelog entries in active node, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # grep -R "308fbe2e-86de-4d41-b517-2ffc834e5f66" * c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739186:E 308fbe2e-86de-4d41-b517-2ffc834e5f66 CREATE 33188 0 0 5e108012-2524-4ec1-9aa4-ee1fdea76220%2F539c19e2%25%25DBU8KA628F c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739186:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739186:D 308fbe2e-86de-4d41-b517-2ffc834e5f66 c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739965:D 308fbe2e-86de-4d41-b517-2ffc834e5f66 c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739965:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There is not link entry in the changelogs, Changelog entries in passive node, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # grep -R "308fbe2e-86de-4d41-b517-2ffc834e5f66" * 622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739965:D 308fbe2e-86de-4d41-b517-2ffc834e5f66 622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739965:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL 622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739172:E 308fbe2e-86de-4d41-b517-2ffc834e5f66 CREATE 33188 0 0 5e108012-2524-4ec1-9aa4-ee1fdea76220%2F539c19e2%25%25DBU8KA628F 622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739172:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL 622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739172:D 308fbe2e-86de-4d41-b517-2ffc834e5f66 622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402740741:E 308fbe2e-86de-4d41-b517-2ffc834e5f66 LINK 33a9f3a5-b987-467c-9518-d261b628366e%2F539c2000%25%251D5NU2HMHG 622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402740741:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It has the link entry. This is similar to what happened with creates also. Since this is happening frequently, I would argue against disconnections of the bricks. Even if it is because of the disconnections, then that is the major problem.
Tried in the build glusterfs-3.6.0.22-1.el6rhs, and still able to reproduce.
Since file entry is missing in changelog even with 'self-heal log capturing' patch, looks like there is some other issue. This needs further investigation.
The issue is not being reproduced during heavy IOs with only barrier enable/disable in loop. I have instrumented the code for the logs and I am trying to reproduce with snapshot.
The issue is being reproduced with snapshot taken in loop and dht spreading across two nodes with instrumented logs. The logs suggests, all the missing gfid entries have gone through 'changelog_write'.and it should have hit changelog file. Instrumented logs iwith gfids look like as follows: [2014-06-26 12:16:49.704031] I [changelog.c:778:changelog_create] 0-master-changelog: hrko-gfid:call: 02f8dc1b-8b61-40cd-81af-f3e13cd87099 [2014-06-26 12:16:49.704445] I [changelog.c:715:changelog_create_cbk] 0-master-changelog: hrko-gfid:cbk: 02f8dc1b-8b61-40cd-81af-f3e13cd87099 [2014-06-26 12:16:49.704493] I [changelog-encoders.c:157:changelog_encode_ascii] 0-master-changelog: hrko:ascii-write:gfid: 02f8dc1b-8b61-40cd-81af-f3e13cd87099: len:96 [2014-06-26 12:16:49.704667] I [changelog-helpers.c:453:changelog_handle_change] 0-master-changelog: hrko-gfid:write: 02f8dc1b-8b61-40cd-81af-f3e13cd87099: ret: 0 So missing those gfids even then leaves us with two areas to check. 1. fsync 2. Race in changelog fd (priv->changelog_fd) fsync is done in 'fysnc interval thread' every 5 secs and as well as just before every rollover of changelog file. So, other area is there could be some race with 'priv->changelog_fd' that might be causing the issue. With having a few more logs, in 'changelog_write', the issue is not being reproduced. We are yet to figure out the root cause. I will update the bug with further findings if any.
Finally found the root cause. Problem: A few changelog files are missing at the backend Cause: Race between actual rollover and explicit rollover. Changelog rollover can happen either due to actual or the explict rollover due to snapshot. Actual rollover is controlled by tuneable called rollover-time. The minimum granularity for rollover-time is 1 second Explicit rollover is asynchronous in nature and happens during snapshot. Basically, rollover renames the current CHANGELOG file to CHANGELOG.TIMESTAMP after rollover-time. Let's assume, at time 't1', actual and explicit rollover raced against each other and actual rollover won the race renaming the CHANGELOG file to CHANGELOG.t1 and opens a new CHANGELOG file. An immediate explicit rollover at time 't1' happened with in the same second to rename CHANGELOG file to CHANGELOG.t1 resulting in purging the earlier CHANGELOG.t1 file created by actual rollover Solution: Adding a virtual delay of 1 sec to timestamp guaranties unique CHANGELOG.TIMESTAMP during explicit rollover. We could have choosed to sleep 1 second to increase the timestamp. But it is unnecessary as adding 1 second to current timestamp achieves the same. ---------------------------------------------------------------------------- From the logs below after CHANGELOG.1403809695, we can see four changelog files being opened with last one rolled over to CHANGELOG.1403809749. So we sould have found 3 changelogs between CHANGELOG.1403809695 and CHANGELOG.1403809749. But at the backend, we can seen only two changelog files (i.e., CHANGELOG.1403809711 , CHANGELOG.1403809734) Supporting logs instrumented logs: ---------------------------------- 211733:[2014-06-26 19:08:16.846279] I [changelog-helpers.c:256:changelog_rollover_changelog] 0-master-changelog: Changelog published:/bricks/brick0/b0/.glusterfs/changelogs/CHANGELOG.1403809695 and signalled bnotify 211734:[2014-06-26 19:08:16.846356] I [changelog-helpers.c:348:changelog_open] 0-master-changelog: hrko-chlog-open: changelog_fd: 17 219286:[2014-06-26 19:08:37.988290] I [changelog-helpers.c:348:changelog_open] 0-master-changelog: hrko-chlog-open: changelog_fd: 17 227277:[2014-06-26 19:08:54.339615] I [changelog-helpers.c:348:changelog_open] 0-master-changelog: hrko-chlog-open: changelog_fd: 17 229160:[2014-06-26 19:08:57.775482] I [changelog.c:1487:notify] 0-master-changelog: Barrier off notification 235340:[2014-06-26 19:09:09.559600] I [changelog.c:1536:notify] 0-master-changelog: Barrier on notification 235343:[2014-06-26 19:09:09.861700] I [changelog-helpers.c:348:changelog_open] 0-master-changelog: hrko-chlog-open: changelog_fd: 17 235365:[2014-06-26 19:09:10.289575] I [changelog-helpers.c:256:changelog_rollover_changelog] 0-master-changelog: Changelog published:/bricks/brick0/b0/.glusterfs/changelogs/CHANGELOG.1403809749 and signalled bnotify Backend changelog files: ------------------------- ..... ..... -rw-r--r-- 1 root root 63717 Jun 27 00:38 CHANGELOG.1403809695 -rw-r--r-- 1 root root 74622 Jun 27 00:38 CHANGELOG.1403809711 -rw-r--r-- 1 root root 78972 Jun 27 00:38 CHANGELOG.1403809734 <-----one file missing -rw-r--r-- 1 root root 226 Jun 27 00:39 CHANGELOG.1403809749 .... ....
Sent patch upstream for review: http://review.gluster.org/#/c/8196/ I will send it downstream, once the patch is merged upstream.
Downstream Patch: https://code.engineering.redhat.com/gerrit/#/c/28128/
Tried 4 to 5 times on the build glusterfs-3.6.0.24-1.el6rhs, and was unable reproduce the issue. Looks like issue is fixed. Since it was a race condition, moving the bug to verified for now, will reopen the bug if the issue is observed again.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html