Bug 1059092

Summary: gsyncd.conf goes corrupt - looses state_file entry - leads to "defunct" geo-rep status
Product: [Community] GlusterFS Reporter: Avra Sengupta <asengupt>
Component: geo-replicationAssignee: Avra Sengupta <asengupt>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: aavati, asengupt, bugs, csaba, david.macdonald, fharshav, gluster-bugs, jcastillo, nlevinki, nsathyan, vshankar
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.6.0beta1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1058999 Environment:
Last Closed: 2014-11-11 08:27:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1058999, 1162142    
Bug Blocks:    

Comment 2 Anand Avati 2014-01-29 14:30:28 UTC
REVIEW: http://review.gluster.org/6856 (gluserd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf) posted (#1) for review on master by Avra Sengupta (asengupt)

Comment 3 Avra Sengupta 2014-01-29 14:41:05 UTC
In the config file we have observed several missing entries including state_file, pid_file, which are crucial for start and stop operations of gsyncd processes. While the status and pid files might themselves be present, the entries that lead to the location of these files is missing. We are investigating the circumstances, that could have lead to the deletion of these entries, as none of the gsyncd/glusterd operations remove entries from the config file and it doesn't seem like a corruption either, as the rest of the entries in the config file are fine.

Meanwhile, we have sent this patch (http://review.gluster.org/6856), which fixes the failure of stop force. With this patch if entries like state_file or pid-file are missing in the gsyncd.conf or if the gsyncd.conf is also missing, glusterd looks for the missing configs in the gsyncd_template.conf. 

stop force will successfully stop an already running session, even if the state-file entries are missing in both the config file and the template, as long as either of them have a pid-file entry. if the pid-file entry is missing in an already started session, then stop force will fetch it from the config template and stop the session. However if the pid-file entry is missing in both the config and the template, stop force will fail with appropriate error stating pid-file entry is missing.

This patch is currently under review, and has been thoroughly unit-tested. But as it involves major changes in critical code path, it would be preferable to have a proper qe regression done on this as well.

Comment 4 Anand Avati 2014-02-03 10:21:15 UTC
REVIEW: http://review.gluster.org/6856 (gluserd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf) posted (#2) for review on master by Avra Sengupta (asengupt)

Comment 5 Anand Avati 2014-02-04 14:00:42 UTC
REVIEW: http://review.gluster.org/6856 (gluserd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf) posted (#3) for review on master by Avra Sengupta (asengupt)

Comment 6 Anand Avati 2014-02-07 14:38:29 UTC
REVIEW: http://review.gluster.org/6856 (gluserd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf) posted (#4) for review on master by Avra Sengupta (asengupt)

Comment 7 Anand Avati 2014-02-10 07:45:31 UTC
REVIEW: http://review.gluster.org/6856 (glusterd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf) posted (#5) for review on master by Avra Sengupta (asengupt)

Comment 8 Anand Avati 2014-02-14 10:42:03 UTC
REVIEW: http://review.gluster.org/6856 (glusterd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf) posted (#6) for review on master by Avra Sengupta (asengupt)

Comment 9 Anand Avati 2014-03-20 07:56:20 UTC
REVIEW: http://review.gluster.org/6856 (glusterd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf) posted (#7) for review on master by Avra Sengupta (asengupt)

Comment 10 Anand Avati 2014-04-30 09:33:54 UTC
REVIEW: http://review.gluster.org/6856 (glusterd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf) posted (#8) for review on master by Avra Sengupta (asengupt)

Comment 11 Anand Avati 2014-05-02 03:21:37 UTC
COMMIT: http://review.gluster.org/6856 committed in master by Vijay Bellur (vbellur) 
------
commit 3d4a31d304064f88d2d1e414346c790f099743b5
Author: Avra Sengupta <asengupt>
Date:   Wed Jan 29 03:06:19 2014 +0000

    glusterd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf
    
    If entries like state_file or pid-file are missing in the gsyncd.conf
    or if the gsyncd.conf is also missing, glusterd looks for the missing
    configs in the gsyncd_template.conf
    
    status will display "Config Corrupted" as long as the entry is missing in
    the config file.  Missing state-file entry in both config and template
    will not allow starting a geo-rep session.
    
    However stop force will successfully stop an already running session,
    if the state-file entries are missing in both the config file and
    the template, as long as either of them have a pid-file entry.
    
    if the pid-file entry is missing in the gsyncd.conf file, starting a
    geo-rep session will not be allowed.
    
    if the pid-file entry is missing in an already started session, then
    stop force will fetch it from the config template and stop the session.
    
    if the pid-file entry is missing in both the config and the template,
    stop force will fail with appropriate error stating pid-file entry is missing.
    
    Change-Id: I81d7cbc4af085d82895bbef46ca732555aa5365d
    BUG: 1059092
    Signed-off-by: Avra Sengupta <asengupt>
    Reviewed-on: http://review.gluster.org/6856
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 12 Niels de Vos 2014-09-22 12:35:26 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 13 Niels de Vos 2014-11-11 08:27:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users