1109149 – Dist-geo-rep : snapshot with geo-rep, while creating files on master, resulted in failure to capture few files entry in changelogs.

Bug 1109149 - Dist-geo-rep : snapshot with geo-rep, while creating files on master, resulted in failure to capture few files entry in changelogs.

Summary: Dist-geo-rep : snapshot with geo-rep, while creating files on master, resulte...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.0.0
Assignee:	Kotresh HR
QA Contact:	Bhaskar Bandari
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1109692
TreeView+	depends on / blocked

Reported:	2014-06-13 10:47 UTC by Vijaykumar Koppad
Modified:	2015-05-13 16:54 UTC (History)
CC List:	12 users (show)
Fixed In Version:	3.6.0.19
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1109692 (view as bug list)
Environment:
Last Closed:	2014-09-22 19:41:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:1278	0	normal	SHIPPED_LIVE	Red Hat Storage Server 3.0 bug fix and enhancement update	2014-09-22 23:26:55 UTC

Description Vijaykumar Koppad 2014-06-13 10:47:26 UTC

Description of problem:  snapshot with geo-rep, while creating files on master, resulted  in failure to capture few files entry in changelogs. In one of the case, active replica doesn't have entry for the missing file but passive replica has.
file in question and its gfid
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
#getfattr -n glusterfs.gfid.string  /mnt/master/thread0/level05/level15/539ab42c%%MFQLJBDHI3
getfattr: Removing leading '/' from absolute path names
# file: mnt/master/thread0/level05/level15/539ab42c%%MFQLJBDHI3
glusterfs.gfid.string="cc9ccc81-9af6-4ca4-8e21-4226f08543cf"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

on active replica
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# find /bricks/ | grep "539ab42c%%MFQLJBDHI3"
/bricks/brick2/master_b7/thread0/level05/level15/539ab42c%%MFQLJBDHI3
[root@redcell ~]# grep "MFQLJBDHI3" /bricks/brick
brick0/ brick1/ brick2/ brick3/ 
[root@redcell ~]# grep "MFQLJBDHI3" /bricks/brick2/master_b7/.glusterfs/changelogs/*
[root@redcell ~]# 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
on passive replica 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# find /bricks/ | grep "539ab42c%%MFQLJBDHI3"
/bricks/brick2/master_b8/thread0/level05/level15/539ab42c%%MFQLJBDHI3
[root@redeye ~]# grep "MFQLJBDHI3" /bricks/brick2/master_b8/.glusterfs/changelogs/*
Binary file /bricks/brick2/master_b8/.glusterfs/changelogs/CHANGELOG.1402647607 matches
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Explanation of the debugging explained in the additional info.

Version-Release number of selected component (if applicable): glusterfs-3.6.0.16-1.el6rhs


How reproducible: Doesn't happen everytime. 


Steps to Reproduce:
1. create and start a geo-rep relationship between master and slave. 
2. start creating data on master using the command "crefi -T 10 -n 10 --multi -d 10 -b 10 --random --max=10K --min=1K /mnt/master"
3. while creating data pause geo-rep
4. create snap-shot of slave,
5. create snap-shot of master
6. resume geo-rep 

Actual results: Few of the file are failed to get captured in changelog.


Expected results: None of the files should be missed in changelog. 


Additional info:

With bit of debugging, we found out some disconnection logs in brick log-file of active brick,

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-06-13 08:16:35.831297] I [server-handshake.c:578:server_setvolume] 0-master-server: accepted client from redlake.blr.redhat.com-5110-2014/06/13-08:16:33:694562-master-client-6-0-0 (version: 3.6.0.16)
[2014-06-13 08:16:35.834937] I [client_t.c:184:gf_client_get] 0-master-server: client_uid=redlake.blr.redhat.com-5110-2014/06/13-08:16:33:694562-master-client-6-0-0
[2014-06-13 08:16:36.122688] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-06-13 08:16:37.296077] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redeye.blr.redhat.com-6963-2014/06/13-07:33:48:600208-master-client-6-0-0
[2014-06-13 08:16:37.296128] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redeye.blr.redhat.com-6963-2014/06/13-07:33:48:600208-master-client-6-0-0
[2014-06-13 08:16:37.619666] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redcloak.blr.redhat.com-25396-2014/06/13-07:33:48:839150-master-client-6-0-0
[2014-06-13 08:16:37.619717] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redcloak.blr.redhat.com-25396-2014/06/13-07:33:48:839150-master-client-6-0-0
[2014-06-13 08:16:37.669722] I [graph.c:269:gf_add_cmdline_options] 0-master-server: adding option 'listen-port' for volume 'master-server' with value '49156'
[2014-06-13 08:16:37.669785] I [graph.c:269:gf_add_cmdline_options] 0-master-posix: adding option 'glusterd-uuid' for volume 'master-posix' with value '86f23681-6c25-4b40-9b60-392266d23c8c'
[2014-06-13 08:16:37.671746] E [changelog-helpers.c:169:htime_update] 0-master-changelog: Htime xattr updation failed
[2014-06-13 08:16:37.678072] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redcell.blr.redhat.com-1723-2014/06/13-07:33:48:176326-master-client-6-0-0
[2014-06-13 08:16:37.678116] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redcell.blr.redhat.com-1723-2014/06/13-07:33:48:176326-master-client-6-0-0
[2014-06-13 08:16:38.301501] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redeye.blr.redhat.com-6965-2014/06/13-07:33:48:619927-master-client-6-0-0
[2014-06-13 08:16:38.301601] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redeye.blr.redhat.com-6965-2014/06/13-07:33:48:619927-master-client-6-0-0
[2014-06-13 08:16:38.639297] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redcloak.blr.redhat.com-25398-2014/06/13-07:33:48:865533-master-client-6-0-0
[2014-06-13 08:16:38.639376] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redcloak.blr.redhat.com-25398-2014/06/13-07:33:48:865533-master-client-6-0-0
[2014-06-13 08:16:38.689713] I [server.c:519:server_rpc_notify] 0-master-server: disconnecting connectionfrom redcell.blr.redhat.com-1725-2014/06/13-07:33:48:189276-master-client-6-0-0
[2014-06-13 08:16:38.689763] I [client_t.c:294:gf_client_put] 0-master-server: Shutting down connection redcell.blr.redhat.com-1725-2014/06/13-07:33:48:189276-master-client-6-0-0
[2014-06-13 08:16:40.010852] I [client_t.c:184:gf_client_get] 0-master-server: client_uid=redlake.blr.redhat.com-5239-2014/06/13-08:16:38:912527-master-client-6-0-0
[2014-06-13 08:16:40.010968] I [server-handshake.c:578:server_setvolume] 0-master-server: accepted client from redlake.blr.redhat.com-5239-2014/06/13-08:16:38:912527-master-client-6-0-0 (version: 3.6.0.16)
[2014-06-13 08:16:40.011506] I [client_t.c:184:gf_client_get] 0-master-server: client_uid=redeye.blr.redhat.com-7394-2014/06/13-08:16:38:251715-master-client-6-0-0
[2014-06-13 08:16:40.011591] I [server-handshake.c:578:server_setvolume] 0-master-server: accepted client from redeye.blr.redhat.com-7394-2014/06/13-08:16:38:251715-master-client-6-0-0 (version: 3.6.0.16)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

the theory is, because of these disconnections, few of the files were not written to the active brick, but it got those files from self of passive brick, which was not captured in changelog.

This is just a theory, and fix for capturing the self-heal in changelog is below. 

---
 xlators/features/changelog/src/changelog-helpers.h | 7 ++++---
 xlators/features/changelog/src/changelog.c         | 6 +++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/xlators/features/changelog/src/changelog-helpers.h b/xlators/features/changelog/src/changelog-helpers.h
index 16d60b9..78a2db0 100644
--- a/xlators/features/changelog/src/changelog-helpers.h
+++ b/xlators/features/changelog/src/changelog-helpers.h
@@ -392,9 +392,10 @@ changelog_forget (xlator_t *this, inode_t *inode);
         } while (0)
 
 /* ignore internal fops */
-#define CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO(dict, label) do {           \
-                if (dict && dict_get (dict, GLUSTERFS_INTERNAL_FOP_KEY)) \
-                        goto label;                                     \
+#define CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO(frame, dict, label) do {     \
+                if ((frame->root->pid > 0) &&                            \
+                    dict && dict_get (dict, GLUSTERFS_INTERNAL_FOP_KEY)) \
+                        goto label;                                      \
         } while (0)
 
 #define CHANGELOG_COND_GOTO(priv, cond, label) do {                    \
diff --git a/xlators/features/changelog/src/changelog.c b/xlators/features/changelog/src/changelog.c
index 4d83d73..360f511 100644
--- a/xlators/features/changelog/src/changelog.c
+++ b/xlators/features/changelog/src/changelog.c
@@ -147,7 +147,7 @@ changelog_unlink (call_frame_t *frame, xlator_t *this,
 
         priv = this->private;
         CHANGELOG_NOT_ACTIVE_THEN_GOTO (frame, priv, wind);
-        CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (xdata, wind);
+        CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (frame, xdata, wind);
 
         CHANGELOG_INIT_NOCHECK (this, frame->local, NULL, loc->inode->gfid, 2);
 
@@ -273,7 +273,7 @@ changelog_link (call_frame_t *frame,
         priv = this->private;
 
         CHANGELOG_NOT_ACTIVE_THEN_GOTO (frame, priv, wind);
-        CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (xdata, wind);
+        CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (frame, xdata, wind);
 
         CHANGELOG_INIT_NOCHECK (this, frame->local, NULL, oldloc->gfid, 2);
 
@@ -481,7 +481,7 @@ changelog_mknod (call_frame_t *frame,
 
         priv = this->private;
         CHANGELOG_NOT_ACTIVE_THEN_GOTO (frame, priv, wind);
-        CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (xdata, wind);
+        CHANGELOG_IF_INTERNAL_FOP_THEN_GOTO (frame, xdata, wind);
 
         ret = dict_get_ptr (xdata, "gfid-req", &uuid_req);
         if (ret) {
-- 1.9.3

Comment 2 Vijaykumar Koppad 2014-06-14 12:06:09 UTC

This has happened again while creating hardlinks on master and snapshot in parallel. 

file in question, is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# getfattr -n glusterfs.gfid.string  /mnt/master/thread0/level05/level15/level25/level35/level45/level55/level65/level75/level85/hardlink_to_files/539c2000%%1D5NU2HMHG
getfattr: Removing leading '/' from absolute path names
# file: mnt/master/thread0/level05/level15/level25/level35/level45/level55/level65/level75/level85/hardlink_to_files/539c2000%%1D5NU2HMHG
glusterfs.gfid.string="308fbe2e-86de-4d41-b517-2ffc834e5f66"

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Changelog entries in active node, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# grep -R "308fbe2e-86de-4d41-b517-2ffc834e5f66" *
c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739186:E 308fbe2e-86de-4d41-b517-2ffc834e5f66 CREATE 33188 0 0 5e108012-2524-4ec1-9aa4-ee1fdea76220%2F539c19e2%25%25DBU8KA628F
c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739186:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL
c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739186:D 308fbe2e-86de-4d41-b517-2ffc834e5f66
c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739965:D 308fbe2e-86de-4d41-b517-2ffc834e5f66
c236684c114c1c9f2bdbc3dabb727d2b/.processed/CHANGELOG.1402739965:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
There is not link entry in the changelogs,

Changelog entries in passive node,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# grep -R "308fbe2e-86de-4d41-b517-2ffc834e5f66" *
622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739965:D 308fbe2e-86de-4d41-b517-2ffc834e5f66
622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739965:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL
622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739172:E 308fbe2e-86de-4d41-b517-2ffc834e5f66 CREATE 33188 0 0 5e108012-2524-4ec1-9aa4-ee1fdea76220%2F539c19e2%25%25DBU8KA628F
622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739172:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL
622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402739172:D 308fbe2e-86de-4d41-b517-2ffc834e5f66
622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402740741:E 308fbe2e-86de-4d41-b517-2ffc834e5f66 LINK 33a9f3a5-b987-467c-9518-d261b628366e%2F539c2000%25%251D5NU2HMHG
622efcce6b6514e8a6ea680d725140de/.processing/CHANGELOG.1402740741:M 308fbe2e-86de-4d41-b517-2ffc834e5f66 NULL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
It has the link entry.

This is similar to what happened with creates also. 
Since this is happening frequently, I would argue against disconnections of the bricks. Even if it is because of the disconnections, then that is the major problem.

Comment 6 Vijaykumar Koppad 2014-06-24 11:06:36 UTC

Tried in the build glusterfs-3.6.0.22-1.el6rhs, and still able to reproduce.

Comment 7 Kotresh HR 2014-06-25 04:33:47 UTC

Since file entry is missing in changelog even with 'self-heal log capturing' patch, looks like there is  some other issue. This needs further investigation.

Comment 8 Kotresh HR 2014-06-25 11:53:56 UTC

The issue is not being reproduced during heavy IOs with only barrier
enable/disable in loop. I have instrumented the code for the logs and 
I am trying to reproduce with snapshot.

Comment 9 Kotresh HR 2014-06-26 14:17:16 UTC

The issue is being reproduced with snapshot taken in loop and dht spreading across two nodes
with instrumented logs. The logs suggests, all the missing gfid entries have gone through
'changelog_write'.and it should have hit changelog file.

Instrumented logs iwith gfids look like as follows:

[2014-06-26 12:16:49.704031] I [changelog.c:778:changelog_create] 0-master-changelog: hrko-gfid:call: 02f8dc1b-8b61-40cd-81af-f3e13cd87099
[2014-06-26 12:16:49.704445] I [changelog.c:715:changelog_create_cbk] 0-master-changelog: hrko-gfid:cbk: 02f8dc1b-8b61-40cd-81af-f3e13cd87099
[2014-06-26 12:16:49.704493] I [changelog-encoders.c:157:changelog_encode_ascii] 0-master-changelog: hrko:ascii-write:gfid: 02f8dc1b-8b61-40cd-81af-f3e13cd87099: len:96
[2014-06-26 12:16:49.704667] I [changelog-helpers.c:453:changelog_handle_change] 0-master-changelog: hrko-gfid:write: 02f8dc1b-8b61-40cd-81af-f3e13cd87099: ret: 0


So missing those gfids even then leaves us with two areas to check.

1. fsync
2. Race in changelog fd (priv->changelog_fd)

fsync is done in 'fysnc interval thread' every 5 secs and as well as just before every 
rollover of changelog file. So, other area is there could be some race with 
'priv->changelog_fd' that might be causing the issue. With having a few more logs,
in 'changelog_write', the issue is not being reproduced. We are yet to figure out the
root cause. I will update the bug with further findings if any.

Comment 10 Kotresh HR 2014-06-28 07:28:36 UTC

Finally found the root cause.

Problem: 
       A few changelog files are missing at the backend

Cause: 
       Race between actual rollover and explicit rollover.

       Changelog rollover can happen either due to actual
       or the explict rollover due to snapshot. Actual 
       rollover is controlled by tuneable called rollover-time.
       The minimum granularity for rollover-time is 1 second
       Explicit rollover is asynchronous in nature and happens
       during snapshot.

       Basically, rollover renames the current CHANGELOG file
       to CHANGELOG.TIMESTAMP after rollover-time. Let's assume,
       at time 't1', actual and explicit rollover raced against
       each other and actual rollover won the race renaming the
       CHANGELOG file to CHANGELOG.t1 and opens a new 
       CHANGELOG file. An immediate explicit rollover at time
       't1' happened with in the same second to rename
       CHANGELOG file to CHANGELOG.t1 resulting in purging the
       earlier CHANGELOG.t1 file created by actual rollover

Solution:
       Adding a virtual delay of 1 sec to timestamp guaranties
       unique CHANGELOG.TIMESTAMP during explicit rollover.

       We could have choosed to sleep 1 second to increase the
       timestamp. But it is unnecessary as adding 1 second to
       current timestamp achieves the same.

----------------------------------------------------------------------------
From the logs below after CHANGELOG.1403809695, we can see four changelog files being opened with last one rolled over to CHANGELOG.1403809749. So we sould have
found 3 changelogs between CHANGELOG.1403809695 and CHANGELOG.1403809749. But
at the backend, we can seen only two changelog files (i.e., CHANGELOG.1403809711 , CHANGELOG.1403809734)

Supporting logs instrumented logs:
----------------------------------
211733:[2014-06-26 19:08:16.846279] I [changelog-helpers.c:256:changelog_rollover_changelog] 0-master-changelog: Changelog published:/bricks/brick0/b0/.glusterfs/changelogs/CHANGELOG.1403809695 and signalled bnotify
211734:[2014-06-26 19:08:16.846356] I [changelog-helpers.c:348:changelog_open] 0-master-changelog: hrko-chlog-open: changelog_fd: 17
219286:[2014-06-26 19:08:37.988290] I [changelog-helpers.c:348:changelog_open] 0-master-changelog: hrko-chlog-open: changelog_fd: 17
227277:[2014-06-26 19:08:54.339615] I [changelog-helpers.c:348:changelog_open] 0-master-changelog: hrko-chlog-open: changelog_fd: 17
229160:[2014-06-26 19:08:57.775482] I [changelog.c:1487:notify] 0-master-changelog: Barrier off notification
235340:[2014-06-26 19:09:09.559600] I [changelog.c:1536:notify] 0-master-changelog: Barrier on notification
235343:[2014-06-26 19:09:09.861700] I [changelog-helpers.c:348:changelog_open] 0-master-changelog: hrko-chlog-open: changelog_fd: 17
235365:[2014-06-26 19:09:10.289575] I [changelog-helpers.c:256:changelog_rollover_changelog] 0-master-changelog: Changelog published:/bricks/brick0/b0/.glusterfs/changelogs/CHANGELOG.1403809749 and signalled bnotify

Backend changelog files:
-------------------------
.....
.....
-rw-r--r-- 1 root root  63717 Jun 27 00:38 CHANGELOG.1403809695
-rw-r--r-- 1 root root  74622 Jun 27 00:38 CHANGELOG.1403809711
-rw-r--r-- 1 root root  78972 Jun 27 00:38 CHANGELOG.1403809734
                                                         <-----one file missing
-rw-r--r-- 1 root root    226 Jun 27 00:39 CHANGELOG.1403809749
....
....

Comment 11 Kotresh HR 2014-06-28 07:30:01 UTC

Sent patch upstream for review:
http://review.gluster.org/#/c/8196/

I will send it downstream, once the patch is merged upstream.

Comment 12 Kotresh HR 2014-06-30 11:46:42 UTC

Downstream Patch:
https://code.engineering.redhat.com/gerrit/#/c/28128/

Comment 14 Vijaykumar Koppad 2014-07-11 13:18:27 UTC

Tried 4 to 5 times on the build glusterfs-3.6.0.24-1.el6rhs, and was unable reproduce the issue. Looks like issue is fixed. Since it was a race condition, moving the bug to verified for now, will reopen the bug if the issue is observed again.

Comment 18 errata-xmlrpc 2014-09-22 19:41:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Note You need to log in before you can comment on or make changes to this bug.