Description of problem: On RHEL6 base OS, post upgrade of rhgs from 2.x/3.x to 3.1.3, volfiles are not recreated. During upgrade, glusterd is brought up with "--xlator-option *.upgrade=on -N" parameters to ensure the volfiles are regenerated w.r.t latest bits but that fails as glusterd init () fails from glusterd_check_gsync_present () with a error log saying "0-glusterd: geo-replication module not working as desired". Please note, this behaviour is not seen with RHEL7 OS. Version-Release number of selected component (if applicable): glusterfs-3.7.9-10 (rhgs-3.1.3) How reproducible: Always Additional info: As a workaround, post 'yum update' following needs to be done. 1. grep -irns "geo-replication module not working as desired" /var/log/glusterfs/etc-glusterfs-glusterd.vol.log | wc -l If the output is non-zero, then go to step 2 2. Check if glusterd instance is running or not by 'ps aux | grep glusterd', if it is, then stop the glusterd service. 3. glusterd --xlator-option *.upgrade=on -N and then proceed ahead with the rest of the steps as per the upgrade section from installation guide.
Since this issue is w.r.t gsyncd, assigning it to Kotresh.
(In reply to Atin Mukherjee from comment #0) > Description of problem: > > On RHEL6 base OS, post upgrade of rhgs from 2.x/3.x to 3.1.3, volfiles are > not recreated. During upgrade, glusterd is brought up with "--xlator-option > *.upgrade=on -N" parameters to ensure the volfiles are regenerated w.r.t > latest bits but that fails as glusterd init () fails from > glusterd_check_gsync_present () with a error log saying "0-glusterd: > geo-replication module not working as desired". A correction here, we hit this problem only when bits are upgraded from rhgs-3.0.4 to rhgs-3.1.3. > > Please note, this behaviour is not seen with RHEL7 OS. > > Version-Release number of selected component (if applicable): > glusterfs-3.7.9-10 (rhgs-3.1.3) > > How reproducible: > Always > > Additional info: > > As a workaround, post 'yum update' following needs to be done. > 1. grep -irns "geo-replication module not working as desired" > /var/log/glusterfs/etc-glusterfs-glusterd.vol.log | wc -l > > If the output is non-zero, then go to step 2 > > 2. Check if glusterd instance is running or not by 'ps aux | grep glusterd', > if it is, then stop the glusterd service. > > 3. glusterd --xlator-option *.upgrade=on -N > > and then proceed ahead with the rest of the steps as per the upgrade section > from installation guide.
Analysis: From the analysis, it is found out that, the 'glusterd --xlator-option *.upgrade=on -N" which is executed during yum update is failed while executing "gsyncd --version" via runner interface. Hence new volfiles are not generated. It is found out that the child process in runner interface which is responsible for calling "gsyncd" via execvp is failed before calling execvp. It happens only when glusterd is called via yum during upgrade. From QE testing it is also found out that it happens on RHEL6 and not in RHEL7. We need to do further analysis to find out whether is this the issue with selinux settings in RHEL6. I will update with further findings. Thanks, Kotresh
@Karthick - Could you please also update us on the status of the offline upgrade test as well?
As per #c10, rhgs version which is equal or older than 3.0.4 and then upgrading it to latest should hit the same issue. So it'd be more generic to call it as upgrade issue from 3.0.x to 3.1.3.
Sweta/Karthick, Can we also test the same behaviour with 3.0.x to 3.1.3 where x < 4?
Kotresh, is it really required to check for gsyncd when glusterd is brought up in upgrade mode? If not we can have that function conditionally called to avoid this issue
(In reply to Atin Mukherjee from comment #12) > Sweta/Karthick, > > Can we also test the same behaviour with 3.0.x to 3.1.3 where x < 4? Done update from 3.0.3 to 3.1.3 (in-service ), result remains same, vol files not regenerated and continuous dict is NULL warning message getting when IO happened. and heal info command worked successfully
Atin, I verified that geo-replication configuration can be ignored during upgrade/downgrade (i.e., glusterd --xlator-option *.upgrade=on -N). It will be configured during gusterd start after upgrade/downgrade
(In reply to Kotresh HR from comment #15) > Atin, > > I verified that geo-replication configuration can be ignored during > upgrade/downgrade (i.e., glusterd --xlator-option *.upgrade=on -N). It will > be configured during gusterd start after upgrade/downgrade Excellent, so waiting for the patch now :)
http://review.gluster.org/#/c/14898/ posted for review.
Hi Anuradha, Please put here the admin steps related to AFR to test the setup which is already updated from 3.0.x to 3.1.3 Thanks
Tested the above workaround steps mentioned and found the below result of it. ============================================================================= 1) Regenerating the vol files happening successfully 2) Healing of files happening successfully 3) No "dict is NULL" warning messages getting in the bricks logs when IO happened. 4) Files which are in split-brain condition are listing only after applying the workaround. Without workaround, heal info/ heal info split-brain commands won't show the files/gfid which are in split-brain condition. One more input to the script usage: ==================================== Currently it's usage is : ./generate-index-files.sh <path-to-brick> <volname> <replicate/disperse> There is no EC support in 3.0.x so "disperse" in the usage <replicate/disperse> has to be removed.
downstream patch https://code.engineering.redhat.com/gerrit/#/c/79963 posted for review.
Downstream patch is merged now.
Testing of 3.7.9-11 is done on rhel6 plat, every thing worked well. Moving to verified state. With this fix, we are not seeing any gsyncd error message in glusterd log after update and vol files are regenerating and not getting any DICT IS NULL WARNING messages in the brick logs. glusterd testing wrt to the code change went well and Sanity testing of all other components also good. With all these info, moving to verified state. Please comment here for any other info.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1576.html