Description of problem: The netfs init script is failing to mount glusterfs mounts on localhost despite glusterd starting before netfs in init. However, if customer adds a 'sleep 60' to netfs init script, everything works as expected on mount. Thus it would seem that glusterd init script is actually giving a return-code to init before it is fully running and ready to service mount requests. Version-Release number of selected component (if applicable): Red Hat Storage 2.1 Red Hat Enterprise Linux 6.4 How reproducible: Very for the customer Steps to Reproduce: 1. Install gluster and configure 2. Add localhost glusterfs to /etc/fstab and enable netfs & glusterd to start on boot localhost:data /data glusterfs _netdev 0 0 3. reboot Actual results: Mount of /data fails Expected results: After reboot /data should be mounted Additional info: 1) If customer adds 'sleep 60' to netfs init script, boot works as expected. 2) https://access.redhat.com/site/solutions/101223 indicates glusterd was moved to start before netfs; however customer having issue with both upstream and Red Hat Storage 2.1
From a customer - The file /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh is also wrong. As it doesn't add the _netdev option when generating the config for /etc/fstab. and then None of them do actually... [root@lp-rhs-02 1]# grep -R defaults * start/post/K29CTDBsetup.sh.rpmsave.rpmsave: mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0" start/post/K29CTDBsetup.sh.rpmsave: mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0" start/post/S29CTDBsetup.sh: mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0" stop/pre/K30samba-stop.sh.rpmsave.rpmsave: mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0" stop/pre/K29CTDB-teardown.sh.rpmsave: mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0" stop/pre/S29CTDB-teardown.sh: mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0" stop/pre/K29CTDB-teardown.sh.rpmsave.rpmsave: mntent="`hostname`:/$volname $mntpt glusterfs defaults,transport=tcp 0 0"
From SFDC 01050444: Looks like when the ctdblock volume is started, the S29CTDBsetup.sh script adds that line to /etc/fstab if it is not there. So when you finally get the server up and check /etc/fstab, there are two entries ( one modified and one original ). I modified that part of the script and it appears to be working. Here is the modified function from the script: function add_fstab_entry () { volname=$1 mntpt=$2 mntent="`hostname`:/$volname $mntpt glusterfs _netdev,defaults,transport=tcp 0 0" exists=`grep "^$mntent" /etc/fstab` if [ "$exists" == "" ] then echo "$mntent" >> /etc/fstab fi } Before, the _netdev was missing.
For what it's worth, I've tested Harold's suggestion against a RHS 2.1.2 (glusterfs 3.4.0.59rhs) environment with a CTDB lock volume and in my lab, adding the '_netdev' option to the 'add_fstab_entry' and 'remove_fstab_entry' functions in both scripts (S29CTDBsetup.sh and S29CTDB-teardown.sh respectively) allows you to locally mount a "glusterfs" filesystem. This was persistent through an entire reboot of a 4 node Gluster cluster.
*** Bug 1074316 has been marked as a duplicate of this bug. ***
Setting flags required to add BZs to RHS 3.0 Errata
The change that has been committed is related to the _netdev option. This should resolve this issue for most customers. In case the network is initialized a little slowly, it may be required to add a LINKDELAY parameter in the /etc/sysconfig/network-scripts/ifcfg-* file(s) as explained here: - http://mjanja.co.ke/2014/04/glusterfs-mounts-fail-at-boot-on-centos/
Oh, it can also be the case that glusterd is not starting the glusterfsd (brick) processes quickly enough. glusterd starts these processes in the background, after the service script exited. It may be required to start the brick processes first, and have glusterd wait with becoming a daemon. This would likely be a change that needs some more work.
Niels, W.r.t comment 10, I think it is not necessary that glusterfsd processes are started to have mount successful. As long as glusterfs can talk to glusterd it will return success. You are right about network initialization in comment 9. Will it do if we create a knowledge base for that as it is not a code change and not every user will face that issue? If yes, then I will create a doc bug for that and let this bug be verified. Let me know what you think.
Patch URL provided in comment 5 says - "Review in Progress" Is this patch merged ?
It would be more helpful to add comment9 as a KB article as suggested by Ragavendra Talur in comment 11. I am not sure how to do this. Neils , what is the procedure add to KBase article ?
Ignore comment 5, we had posted that downstream for rhs-3.0 branch before it was decided that we will follow upstream strategy. The corresponding upstream patch http://review.gluster.org/#/c/7221/ got merged before we fulled upstream code for rhs-3.0. The patch exists downstream.
(In reply to SATHEESARAN from comment #14) > It would be more helpful to add comment9 as a KB article as suggested by > Ragavendra Talur in comment 11. > > I am not sure how to do this. > > Neils , what is the procedure add to KBase article ? In the 'external trackers' for the bug, we already have a knowledge base solution linked: - https://access.redhat.com/site/solutions/747673 You should have a login for the Red Hat Customer Portal, the login would look something like rhn-qa-sasundar (mine is rhn-support-ndevos). At the moment, the LINKDELAY option is not mentioned in the article. Do you want to add that, or shall I do that?
(In reply to Niels de Vos from comment #16) > (In reply to SATHEESARAN from comment #14) > > It would be more helpful to add comment9 as a KB article as suggested by > > Ragavendra Talur in comment 11. > > > > I am not sure how to do this. > > > > Neils , what is the procedure add to KBase article ? > > In the 'external trackers' for the bug, we already have a knowledge base > solution linked: > - https://access.redhat.com/site/solutions/747673 > > You should have a login for the Red Hat Customer Portal, the login would > look something like rhn-qa-sasundar (mine is rhn-support-ndevos). > > At the moment, the LINKDELAY option is not mentioned in the article. Do > you want to add that, or shall I do that? It would be good, if you could take it up as I am very new to writing a KBase Article
(In reply to Raghavendra Talur from comment #15) > Ignore comment 5, we had posted that downstream for rhs-3.0 branch before it > was > decided that we will follow upstream strategy. > > The corresponding upstream patch http://review.gluster.org/#/c/7221/ got > merged before we fulled upstream code for rhs-3.0. The patch exists > downstream. Thanks for the reply. Tested with glusterfs-3.6.0.22-1.el6rhs Followed the steps below : 0. Setup a 2 node cluster 1. Created a replica volume ( replica count 2 ) 2. Edit the file on both the nodes, "/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh" & "/var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh", and replace META with the volume name created in step 1 3. Start the volume 4. Check the /etc/fstab file for gluster mount entry. Observation was that - glusterfs mount entry had _netdev option 5. Stop the volume observation - glusterfs mount entry being removed from /etc/fstab Apart from the CTDB test, simple fstab entry for glusterfs mount on RHEL 6.5 with _netdev option also works well Marking this bug as VERIFIED
Hi Raghavendra, Please review the edited doc text for technical accuracy and sign off.
Verified the doc text for technical accuracy.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html
This will be fixed in 2.1.6. Bug for that https://bugzilla.redhat.com/show_bug.cgi?id=1180137