Bug 1035042 - Despite glusterd init script now starting before netfs, netfs fails to mount localhost glusterfs shares in RHS 2.1
Summary: Despite glusterd init script now starting before netfs, netfs fails to mount ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: Raghavendra Talur
QA Contact: SATHEESARAN
URL:
Whiteboard:
: 1074316 (view as bug list)
Depends On:
Blocks: 1061468 1073815 1075182
TreeView+ depends on / blocked
 
Reported: 2013-11-26 23:04 UTC by James Hartsock
Modified: 2019-02-15 13:35 UTC (History)
19 users (show)

Fixed In Version: glusterfs-3.6.0.9-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Previously, entries in /etc/fstab for glusterfs mounts did not have _netdev option. This led to some systems becoming unresponsive. With this fix, the hook scripts have '_netdev' option defined for glusterFS mounts in the /etc/fstab and the mount operation is successful.
Clone Of:
: 1075182 1180137 (view as bug list)
Environment:
Last Closed: 2014-09-22 19:29:47 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 747673 0 None None None Never
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description James Hartsock 2013-11-26 23:04:02 UTC
Description of problem:
The netfs init script is failing to mount glusterfs mounts on localhost despite glusterd starting before netfs in init.  However, if customer adds a 'sleep 60' to netfs init script, everything works as expected on mount.  Thus it would seem that glusterd init script is actually giving a return-code to init before it is fully running and ready to service mount requests.

Version-Release number of selected component (if applicable):
Red Hat Storage 2.1
Red Hat Enterprise Linux 6.4

How reproducible:
Very for the customer


Steps to Reproduce:
1. Install gluster and configure 
2. Add localhost glusterfs to /etc/fstab and enable netfs & glusterd to start on boot
localhost:data /data glusterfs _netdev 0 0
3.  reboot

Actual results:
Mount of /data fails

Expected results:
After reboot /data should be mounted

Additional info:
1)  If customer adds 'sleep 60' to netfs init script, boot works as expected.

2)  https://access.redhat.com/site/solutions/101223 indicates glusterd was moved to start before netfs; however customer having issue with both upstream and Red Hat Storage 2.1

Comment 2 Harold Miller 2014-01-30 19:56:55 UTC
From a customer - The file /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh is also wrong. As it doesn't add the _netdev option when generating the config for /etc/fstab.

and then

None of them do actually...

[root@lp-rhs-02 1]# grep -R defaults *
start/post/K29CTDBsetup.sh.rpmsave.rpmsave:        mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0"
start/post/K29CTDBsetup.sh.rpmsave:        mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0"
start/post/S29CTDBsetup.sh:        mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0"
stop/pre/K30samba-stop.sh.rpmsave.rpmsave:	mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0"
stop/pre/K29CTDB-teardown.sh.rpmsave:	mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0"
stop/pre/S29CTDB-teardown.sh:	mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0"
stop/pre/K29CTDB-teardown.sh.rpmsave.rpmsave:	mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0"

Comment 3 Harold Miller 2014-03-06 19:19:30 UTC
From SFDC 01050444:

 Looks like when the ctdblock volume is started, the S29CTDBsetup.sh script adds that line to /etc/fstab if it is not there.  So when you finally get the server up and check /etc/fstab, there are two entries ( one modified and one original ).

I modified that part of the script and it appears to be working.  

Here is the modified function from the script:

function add_fstab_entry () {
        volname=$1
        mntpt=$2
        mntent="`hostname`:/$volname $mntpt glusterfs _netdev,defaults,transport=tcp 0 0"
        exists=`grep "^$mntent" /etc/fstab`
        if [ "$exists" == "" ]
        then
            echo "$mntent" >> /etc/fstab
        fi
}

Before, the _netdev was missing.

Comment 4 Aron Gunn 2014-03-06 21:25:00 UTC
For what it's worth, I've tested Harold's suggestion against a RHS 2.1.2 (glusterfs 3.4.0.59rhs) environment with a CTDB lock volume and in my lab, adding the '_netdev' option to the 'add_fstab_entry' and 'remove_fstab_entry' functions in both scripts (S29CTDBsetup.sh and S29CTDB-teardown.sh respectively) allows you to locally mount a "glusterfs" filesystem.  This was persistent through an entire reboot of a 4 node Gluster cluster.

Comment 6 Keith Schincke 2014-03-20 15:09:14 UTC
*** Bug 1074316 has been marked as a duplicate of this bug. ***

Comment 7 Nagaprasad Sathyanarayana 2014-05-19 10:56:17 UTC
Setting flags required to add BZs to RHS 3.0 Errata

Comment 9 Niels de Vos 2014-05-28 09:42:43 UTC
The change that has been committed is related to the _netdev option. This should resolve this issue for most customers.

In case the network is initialized a little slowly, it may be required to add a LINKDELAY parameter in the /etc/sysconfig/network-scripts/ifcfg-* file(s) as explained here:
- http://mjanja.co.ke/2014/04/glusterfs-mounts-fail-at-boot-on-centos/

Comment 10 Niels de Vos 2014-05-28 09:45:50 UTC
Oh, it can also be the case that glusterd is not starting the glusterfsd (brick) processes quickly enough. glusterd starts these processes in the background, after the service script exited.

It may be required to start the brick processes first, and have glusterd wait with becoming a daemon. This would likely be a change that needs some more work.

Comment 11 Raghavendra Talur 2014-05-28 13:09:57 UTC
Niels,

W.r.t comment 10, I think it is not necessary that glusterfsd processes are started to have mount successful. As long as glusterfs can talk to glusterd it will return success.

You are right about network initialization in comment 9. Will it do if we create
a knowledge base for that as it is not a code change and not every user will face that issue? If yes, then I will create a doc bug for that and let this bug be verified.

Let me know what you think.

Comment 13 SATHEESARAN 2014-06-25 08:43:17 UTC
Patch URL provided in comment 5 says - "Review in Progress"
Is this patch merged ?

Comment 14 SATHEESARAN 2014-06-25 09:46:48 UTC
It would be more helpful to add comment9 as a KB article as suggested by Ragavendra Talur in comment 11.

I am not sure how to do this.

Neils , what is the procedure add to KBase article ?

Comment 15 Raghavendra Talur 2014-06-25 09:54:52 UTC
Ignore comment 5, we had posted that downstream for rhs-3.0 branch before it was
decided that we will follow upstream strategy.

The corresponding upstream patch http://review.gluster.org/#/c/7221/ got merged before we fulled upstream code for rhs-3.0. The patch exists downstream.

Comment 16 Niels de Vos 2014-06-25 10:18:02 UTC
(In reply to SATHEESARAN from comment #14)
> It would be more helpful to add comment9 as a KB article as suggested by
> Ragavendra Talur in comment 11.
> 
> I am not sure how to do this.
> 
> Neils , what is the procedure add to KBase article ?

In the 'external trackers' for the bug, we already have a knowledge base 
solution linked:
- https://access.redhat.com/site/solutions/747673

You should have a login for the Red Hat Customer Portal, the login would 
look something like rhn-qa-sasundar (mine is rhn-support-ndevos).

At the moment, the LINKDELAY option is not mentioned in the article. Do  
you want to add that, or shall I do that?

Comment 17 SATHEESARAN 2014-06-25 10:35:13 UTC
(In reply to Niels de Vos from comment #16)
> (In reply to SATHEESARAN from comment #14)
> > It would be more helpful to add comment9 as a KB article as suggested by
> > Ragavendra Talur in comment 11.
> > 
> > I am not sure how to do this.
> > 
> > Neils , what is the procedure add to KBase article ?
> 
> In the 'external trackers' for the bug, we already have a knowledge base 
> solution linked:
> - https://access.redhat.com/site/solutions/747673
> 
> You should have a login for the Red Hat Customer Portal, the login would 
> look something like rhn-qa-sasundar (mine is rhn-support-ndevos).
> 
> At the moment, the LINKDELAY option is not mentioned in the article. Do  
> you want to add that, or shall I do that?

It would be good, if you could take it up as I am very new to writing a KBase Article

Comment 18 SATHEESARAN 2014-06-25 10:35:37 UTC
(In reply to Raghavendra Talur from comment #15)
> Ignore comment 5, we had posted that downstream for rhs-3.0 branch before it
> was
> decided that we will follow upstream strategy.
> 
> The corresponding upstream patch http://review.gluster.org/#/c/7221/ got
> merged before we fulled upstream code for rhs-3.0. The patch exists
> downstream.

Thanks for the reply.

Tested with glusterfs-3.6.0.22-1.el6rhs

Followed the steps below :
0. Setup a 2 node cluster
1. Created a replica volume ( replica count 2 )
2. Edit the file on both the nodes, "/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh"  & "/var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh", and replace META with the volume name created in step 1
3. Start the volume
4. Check the /etc/fstab file for gluster mount entry.
Observation was that - glusterfs mount entry had _netdev option

5. Stop the volume
observation - glusterfs mount entry being removed from /etc/fstab

Apart from the CTDB test, simple fstab entry for glusterfs mount on RHEL 6.5 with _netdev option also works well

Marking this bug as VERIFIED

Comment 19 Pavithra 2014-08-07 06:11:57 UTC
Hi Raghavendra,

Please review the edited doc text for technical accuracy and sign off.

Comment 20 Raghavendra Talur 2014-08-07 06:15:59 UTC
Verified the doc text for technical accuracy.

Comment 22 errata-xmlrpc 2014-09-22 19:29:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Comment 26 Raghavendra Talur 2015-01-08 13:27:52 UTC
This will be fixed in 2.1.6.
Bug for that https://bugzilla.redhat.com/show_bug.cgi?id=1180137


Note You need to log in before you can comment on or make changes to this bug.