Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1353470 - upgrade from 3.0.4 to 3.1.3 doesn't regenerate the volfiles
upgrade from 3.0.4 to 3.1.3 doesn't regenerate the volfiles
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.1
x86_64 Linux
unspecified Severity urgent
: ---
: RHGS 3.1.3 Async
Assigned To: Kotresh HR
Byreddy
: ZStream
Depends On:
Blocks: 1355628 1356426 1356439
  Show dependency treegraph
 
Reported: 2016-07-07 04:29 EDT by Atin Mukherjee
Modified: 2016-09-17 12:47 EDT (History)
12 users (show)

See Also:
Fixed In Version: glusterfs-3.7.9-11
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1355628 (view as bug list)
Environment:
Last Closed: 2016-08-08 05:34:50 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1576 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 glusterfs Update 2016-08-08 09:34:36 EDT

  None (edit)
Description Atin Mukherjee 2016-07-07 04:29:56 EDT
Description of problem:

On RHEL6 base OS, post upgrade of rhgs from 2.x/3.x to 3.1.3, volfiles are not recreated. During upgrade, glusterd is brought up with "--xlator-option *.upgrade=on -N" parameters to ensure the volfiles are regenerated w.r.t latest bits but that fails as glusterd init () fails from glusterd_check_gsync_present () with a error log saying "0-glusterd: geo-replication module not working as desired".

Please note, this behaviour is not seen with RHEL7 OS.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-10 (rhgs-3.1.3)

How reproducible:
Always

Additional info:

As a workaround, post 'yum update' following needs to be done.
1. grep -irns "geo-replication module not working as desired" /var/log/glusterfs/etc-glusterfs-glusterd.vol.log | wc -l

If the output is non-zero, then go to step 2

2. Check if glusterd instance is running or not by 'ps aux | grep glusterd', if it is, then stop the glusterd service.

3. glusterd --xlator-option *.upgrade=on -N

and then proceed ahead with the rest of the steps as per the upgrade section from installation guide.
Comment 2 Atin Mukherjee 2016-07-07 04:34:21 EDT
Since this issue is w.r.t gsyncd, assigning it to Kotresh.
Comment 5 Atin Mukherjee 2016-07-07 10:57:28 EDT
(In reply to Atin Mukherjee from comment #0)
> Description of problem:
> 
> On RHEL6 base OS, post upgrade of rhgs from 2.x/3.x to 3.1.3, volfiles are
> not recreated. During upgrade, glusterd is brought up with "--xlator-option
> *.upgrade=on -N" parameters to ensure the volfiles are regenerated w.r.t
> latest bits but that fails as glusterd init () fails from
> glusterd_check_gsync_present () with a error log saying "0-glusterd:
> geo-replication module not working as desired".

A correction here, we hit this problem only when bits are upgraded from rhgs-3.0.4 to rhgs-3.1.3.

> 
> Please note, this behaviour is not seen with RHEL7 OS.
> 
> Version-Release number of selected component (if applicable):
> glusterfs-3.7.9-10 (rhgs-3.1.3)
> 
> How reproducible:
> Always
> 
> Additional info:
> 
> As a workaround, post 'yum update' following needs to be done.
> 1. grep -irns "geo-replication module not working as desired"
> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log | wc -l
> 
> If the output is non-zero, then go to step 2
> 
> 2. Check if glusterd instance is running or not by 'ps aux | grep glusterd',
> if it is, then stop the glusterd service.
> 
> 3. glusterd --xlator-option *.upgrade=on -N
> 
> and then proceed ahead with the rest of the steps as per the upgrade section
> from installation guide.
Comment 6 Kotresh HR 2016-07-07 11:57:30 EDT
Analysis:

From the analysis, it is found out that, the 'glusterd --xlator-option *.upgrade=on -N" which is executed during yum update is failed while executing
"gsyncd --version" via runner interface. Hence new volfiles are not generated.

It is found out that the child process in runner interface which is responsible for calling "gsyncd" via execvp is failed before calling execvp. It happens only when glusterd is called via yum during upgrade. From QE testing it is also found out that it happens on RHEL6 and not in RHEL7. We need to do further analysis to find out whether is this the issue with selinux settings in RHEL6.

I will update with further findings.

Thanks,
Kotresh
Comment 8 Atin Mukherjee 2016-07-08 00:28:59 EDT
@Karthick - Could you please also update us on the status of the offline upgrade test as well?
Comment 11 Atin Mukherjee 2016-07-08 07:28:32 EDT
As per #c10, rhgs version which is equal or older than 3.0.4 and then upgrading it to latest should hit the same issue. So it'd be more generic to call it as upgrade issue from 3.0.x to 3.1.3.
Comment 12 Atin Mukherjee 2016-07-08 07:32:18 EDT
Sweta/Karthick,

Can we also test the same behaviour with 3.0.x to 3.1.3 where x < 4?
Comment 13 Atin Mukherjee 2016-07-08 10:27:35 EDT
Kotresh, is it really required to check for gsyncd when glusterd is brought up in upgrade mode? If not we can have that function conditionally called to avoid this issue
Comment 14 Byreddy 2016-07-11 03:13:12 EDT
(In reply to Atin Mukherjee from comment #12)
> Sweta/Karthick,
> 
> Can we also test the same behaviour with 3.0.x to 3.1.3 where x < 4?

Done update from 3.0.3 to 3.1.3 (in-service ), result remains same, vol files not regenerated and continuous  dict is NULL warning message getting when IO happened.

and heal info command worked successfully
Comment 15 Kotresh HR 2016-07-12 02:19:11 EDT
Atin,

I verified that geo-replication configuration can be ignored during upgrade/downgrade (i.e., glusterd --xlator-option *.upgrade=on -N). It will
be configured during gusterd start after upgrade/downgrade
Comment 16 Atin Mukherjee 2016-07-12 02:28:21 EDT
(In reply to Kotresh HR from comment #15)
> Atin,
> 
> I verified that geo-replication configuration can be ignored during
> upgrade/downgrade (i.e., glusterd --xlator-option *.upgrade=on -N). It will
> be configured during gusterd start after upgrade/downgrade

Excellent, so waiting for the patch now :)
Comment 17 Atin Mukherjee 2016-07-12 02:33:18 EDT
http://review.gluster.org/#/c/14898/ posted for review.
Comment 18 Byreddy 2016-07-15 03:25:14 EDT
Hi Anuradha,

Please put here the admin steps related to AFR to test the setup which is already updated from 3.0.x to 3.1.3 

Thanks
Comment 22 Byreddy 2016-07-18 01:31:22 EDT
Tested the above workaround steps mentioned and found the below result of it.
=============================================================================

1) Regenerating  the vol files happening successfully
2) Healing of files happening successfully
3) No "dict is NULL"  warning messages getting in the bricks logs  when IO happened.
4) Files which are in split-brain condition are listing only after applying the workaround.

Without workaround, heal info/ heal info split-brain commands won't show  the files/gfid which are in split-brain condition.


One more input to the script usage:
====================================
Currently it's  usage is : ./generate-index-files.sh <path-to-brick> <volname> <replicate/disperse>

There is no EC support in 3.0.x so "disperse" in the usage <replicate/disperse> has to be removed.
Comment 25 Atin Mukherjee 2016-07-27 01:04:20 EDT
downstream patch https://code.engineering.redhat.com/gerrit/#/c/79963 posted for review.
Comment 28 Atin Mukherjee 2016-07-27 06:27:45 EDT
Downstream patch is merged now.
Comment 32 Byreddy 2016-08-01 08:34:31 EDT
Testing of 3.7.9-11 is done on rhel6 plat, every thing worked well.

Moving to verified state.


With this fix, we are not seeing  any gsyncd error message in glusterd log after update and vol files are regenerating and not getting any DICT IS NULL WARNING messages in the brick logs.

glusterd testing wrt to the code change went well and  Sanity testing of all other components also good.

With all these info, moving to verified state.

Please comment here for any other info.
Comment 34 errata-xmlrpc 2016-08-08 05:34:50 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1576.html

Note You need to log in before you can comment on or make changes to this bug.