Bug 1353470 - upgrade from 3.0.4 to 3.1.3 doesn't regenerate the volfiles
Summary: upgrade from 3.0.4 to 3.1.3 doesn't regenerate the volfiles
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.1.3 Async
Assignee: Kotresh HR
QA Contact: Byreddy
URL:
Whiteboard:
Depends On:
Blocks: 1355628 1356426 1356439
TreeView+ depends on / blocked
 
Reported: 2016-07-07 08:29 UTC by Atin Mukherjee
Modified: 2016-09-17 16:47 UTC (History)
12 users (show)

Fixed In Version: glusterfs-3.7.9-11
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1355628 (view as bug list)
Environment:
Last Closed: 2016-08-08 09:34:50 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1576 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 glusterfs Update 2016-08-08 13:34:36 UTC

Description Atin Mukherjee 2016-07-07 08:29:56 UTC
Description of problem:

On RHEL6 base OS, post upgrade of rhgs from 2.x/3.x to 3.1.3, volfiles are not recreated. During upgrade, glusterd is brought up with "--xlator-option *.upgrade=on -N" parameters to ensure the volfiles are regenerated w.r.t latest bits but that fails as glusterd init () fails from glusterd_check_gsync_present () with a error log saying "0-glusterd: geo-replication module not working as desired".

Please note, this behaviour is not seen with RHEL7 OS.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-10 (rhgs-3.1.3)

How reproducible:
Always

Additional info:

As a workaround, post 'yum update' following needs to be done.
1. grep -irns "geo-replication module not working as desired" /var/log/glusterfs/etc-glusterfs-glusterd.vol.log | wc -l

If the output is non-zero, then go to step 2

2. Check if glusterd instance is running or not by 'ps aux | grep glusterd', if it is, then stop the glusterd service.

3. glusterd --xlator-option *.upgrade=on -N

and then proceed ahead with the rest of the steps as per the upgrade section from installation guide.

Comment 2 Atin Mukherjee 2016-07-07 08:34:21 UTC
Since this issue is w.r.t gsyncd, assigning it to Kotresh.

Comment 5 Atin Mukherjee 2016-07-07 14:57:28 UTC
(In reply to Atin Mukherjee from comment #0)
> Description of problem:
> 
> On RHEL6 base OS, post upgrade of rhgs from 2.x/3.x to 3.1.3, volfiles are
> not recreated. During upgrade, glusterd is brought up with "--xlator-option
> *.upgrade=on -N" parameters to ensure the volfiles are regenerated w.r.t
> latest bits but that fails as glusterd init () fails from
> glusterd_check_gsync_present () with a error log saying "0-glusterd:
> geo-replication module not working as desired".

A correction here, we hit this problem only when bits are upgraded from rhgs-3.0.4 to rhgs-3.1.3.

> 
> Please note, this behaviour is not seen with RHEL7 OS.
> 
> Version-Release number of selected component (if applicable):
> glusterfs-3.7.9-10 (rhgs-3.1.3)
> 
> How reproducible:
> Always
> 
> Additional info:
> 
> As a workaround, post 'yum update' following needs to be done.
> 1. grep -irns "geo-replication module not working as desired"
> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log | wc -l
> 
> If the output is non-zero, then go to step 2
> 
> 2. Check if glusterd instance is running or not by 'ps aux | grep glusterd',
> if it is, then stop the glusterd service.
> 
> 3. glusterd --xlator-option *.upgrade=on -N
> 
> and then proceed ahead with the rest of the steps as per the upgrade section
> from installation guide.

Comment 6 Kotresh HR 2016-07-07 15:57:30 UTC
Analysis:

From the analysis, it is found out that, the 'glusterd --xlator-option *.upgrade=on -N" which is executed during yum update is failed while executing
"gsyncd --version" via runner interface. Hence new volfiles are not generated.

It is found out that the child process in runner interface which is responsible for calling "gsyncd" via execvp is failed before calling execvp. It happens only when glusterd is called via yum during upgrade. From QE testing it is also found out that it happens on RHEL6 and not in RHEL7. We need to do further analysis to find out whether is this the issue with selinux settings in RHEL6.

I will update with further findings.

Thanks,
Kotresh

Comment 8 Atin Mukherjee 2016-07-08 04:28:59 UTC
@Karthick - Could you please also update us on the status of the offline upgrade test as well?

Comment 11 Atin Mukherjee 2016-07-08 11:28:32 UTC
As per #c10, rhgs version which is equal or older than 3.0.4 and then upgrading it to latest should hit the same issue. So it'd be more generic to call it as upgrade issue from 3.0.x to 3.1.3.

Comment 12 Atin Mukherjee 2016-07-08 11:32:18 UTC
Sweta/Karthick,

Can we also test the same behaviour with 3.0.x to 3.1.3 where x < 4?

Comment 13 Atin Mukherjee 2016-07-08 14:27:35 UTC
Kotresh, is it really required to check for gsyncd when glusterd is brought up in upgrade mode? If not we can have that function conditionally called to avoid this issue

Comment 14 Byreddy 2016-07-11 07:13:12 UTC
(In reply to Atin Mukherjee from comment #12)
> Sweta/Karthick,
> 
> Can we also test the same behaviour with 3.0.x to 3.1.3 where x < 4?

Done update from 3.0.3 to 3.1.3 (in-service ), result remains same, vol files not regenerated and continuous  dict is NULL warning message getting when IO happened.

and heal info command worked successfully

Comment 15 Kotresh HR 2016-07-12 06:19:11 UTC
Atin,

I verified that geo-replication configuration can be ignored during upgrade/downgrade (i.e., glusterd --xlator-option *.upgrade=on -N). It will
be configured during gusterd start after upgrade/downgrade

Comment 16 Atin Mukherjee 2016-07-12 06:28:21 UTC
(In reply to Kotresh HR from comment #15)
> Atin,
> 
> I verified that geo-replication configuration can be ignored during
> upgrade/downgrade (i.e., glusterd --xlator-option *.upgrade=on -N). It will
> be configured during gusterd start after upgrade/downgrade

Excellent, so waiting for the patch now :)

Comment 17 Atin Mukherjee 2016-07-12 06:33:18 UTC
http://review.gluster.org/#/c/14898/ posted for review.

Comment 18 Byreddy 2016-07-15 07:25:14 UTC
Hi Anuradha,

Please put here the admin steps related to AFR to test the setup which is already updated from 3.0.x to 3.1.3 

Thanks

Comment 22 Byreddy 2016-07-18 05:31:22 UTC
Tested the above workaround steps mentioned and found the below result of it.
=============================================================================

1) Regenerating  the vol files happening successfully
2) Healing of files happening successfully
3) No "dict is NULL"  warning messages getting in the bricks logs  when IO happened.
4) Files which are in split-brain condition are listing only after applying the workaround.

Without workaround, heal info/ heal info split-brain commands won't show  the files/gfid which are in split-brain condition.


One more input to the script usage:
====================================
Currently it's  usage is : ./generate-index-files.sh <path-to-brick> <volname> <replicate/disperse>

There is no EC support in 3.0.x so "disperse" in the usage <replicate/disperse> has to be removed.

Comment 25 Atin Mukherjee 2016-07-27 05:04:20 UTC
downstream patch https://code.engineering.redhat.com/gerrit/#/c/79963 posted for review.

Comment 28 Atin Mukherjee 2016-07-27 10:27:45 UTC
Downstream patch is merged now.

Comment 32 Byreddy 2016-08-01 12:34:31 UTC
Testing of 3.7.9-11 is done on rhel6 plat, every thing worked well.

Moving to verified state.


With this fix, we are not seeing  any gsyncd error message in glusterd log after update and vol files are regenerating and not getting any DICT IS NULL WARNING messages in the brick logs.

glusterd testing wrt to the code change went well and  Sanity testing of all other components also good.

With all these info, moving to verified state.

Please comment here for any other info.

Comment 34 errata-xmlrpc 2016-08-08 09:34:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1576.html


Note You need to log in before you can comment on or make changes to this bug.