Bug 1113543 - Spec %post server does not wait for the old glusterd to exit
Summary: Spec %post server does not wait for the old glusterd to exit
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: build
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
Assignee: Kaleb KEITHLEY
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 960752 1113959 1145000
TreeView+ depends on / blocked
 
Reported: 2014-06-26 11:40 UTC by Patrick Uiterwijk
Modified: 2015-12-01 16:45 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.6.0beta1
Clone Of:
: 1113959 1145000 (view as bug list)
Environment:
Last Closed: 2014-11-11 08:36:02 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
glusterd log (8.01 KB, text/plain)
2014-09-25 14:36 UTC, Lalatendu Mohanty
no flags Details

Description Patrick Uiterwijk 2014-06-26 11:40:13 UTC
Description of problem:
The %post server of gluster.spec says:
killall glusterd &> /dev/null
glusterd --xlator-option *.upgrade=on -N
This doesn't wait for the old glusterd to actually exit, so the new one sees it cannot bind to the interface and quits, and then the original one quits, leaving no glusterd actually running.


Version-Release number of selected component (if applicable):
glusterfs-3.5.1


How reproducible:
Everytime


Steps to Reproduce:
1. Run glusterd
2. Upgrade from 3.5.0 to 3.5.1
3.

Actual results:
No glusterd running anymore


Expected results:
An upgraded glusterd running


Additional info:

Comment 1 Niels de Vos 2014-06-26 13:35:37 UTC
The post installation script for the glusterfs-server handles the restarting of glusterd incorrect. This caused an outage when the glusterfs-server package was automatically updated.

After checking the logs together with Patrick, we came to the conclusion that the running glusterd should have received a signal and would be exiting. However, the script does not wait for the running glusterd to exit, and starts a new glusterd process immediately after sending the SIGTERM. In case the 1st glusterd process has not exited yet, the new glusterd process can not listen on port 24007 and exits. The 1st glusterd will exit eventually too, leaving the service unavailable.

Snippet from the .spec:

 735 %post server
 ...
 769 pidof -c -o %PPID -x glusterd &> /dev/null
 770 if [ $? -eq 0 ]; then
 ...
 773     killall glusterd &> /dev/null
 774     glusterd --xlator-option *.upgrade=on -N
 775 else
 776     glusterd --xlator-option *.upgrade=on -N
 777 fi
 ...


I am not sure what the best way is to start glusterd with these specific options once. Maybe these should get listed in /etc/sysconfig/glusterd so that the standard init-script or systemd-job handles it?

Comment 2 Kaleb KEITHLEY 2014-06-26 15:31:35 UTC
Which is the primary concern, that the new glusterd was started too soon? That we need a cleaner solution for starting glusterd with the *.upgrade=on option? Or both?

Comment 3 Anand Avati 2014-06-26 21:18:17 UTC
REVIEW: http://review.gluster.org/8185 (build/glusterfs.spec.in: %post server doesn't wait for old glusterd) posted (#1) for review on master by Kaleb KEITHLEY (kkeithle)

Comment 4 Anand Avati 2014-06-27 10:36:21 UTC
REVIEW: http://review.gluster.org/8185 (build/glusterfs.spec.in: %post server doesn't wait for old glusterd) posted (#2) for review on master by Kaleb KEITHLEY (kkeithle)

Comment 5 Anand Avati 2014-06-27 11:08:14 UTC
REVIEW: http://review.gluster.org/8185 (build/glusterfs.spec.in: %post server doesn't wait for old glusterd) posted (#3) for review on master by Kaleb KEITHLEY (kkeithle)

Comment 6 Anand Avati 2014-06-30 15:15:08 UTC
REVIEW: http://review.gluster.org/8185 (build/glusterfs.spec.in: %post server doesn't wait for old glusterd) posted (#4) for review on master by Kaleb KEITHLEY (kkeithle)

Comment 7 Anand Avati 2014-07-02 08:39:39 UTC
COMMIT: http://review.gluster.org/8185 committed in master by Vijay Bellur (vbellur) 
------
commit 858b570a0c62d31416f0aee8c385b3118a1fad43
Author: Kaleb S. KEITHLEY <kkeithle>
Date:   Thu Jun 26 17:14:39 2014 -0400

    build/glusterfs.spec.in: %post server doesn't wait for old glusterd
    
    'killall glusterd' needs to wait for the old glusterd to exit
    before starting the updated one, otherwise the new process can't
    bind to its socket ports
    
    Change-Id: Ib43c76f232e0ea6f7f8469fb12be7f2b907fb7c8
    BUG: 1113543
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/8185
    Reviewed-by: Niels de Vos <ndevos>
    Reviewed-by: Lalatendu Mohanty <lmohanty>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Humble Devassy Chirammal <humble.devassy>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 8 gene 2014-08-06 16:54:22 UTC
I posted the below message to the gluster-users list and was asked to also post it here:

When updates applied a couple of nights ago all my Gluster nodes went down and "service glusterd status" reported it dead on all 3 nodes in my replicated setup. This seems very similar to a bug that was recently fixed (https://bugzilla.redhat.com/show_bug.cgi?id=1113543)  Any ideas what's up with this?

[root@eapps-gluster01 ~]# rpm -qa |grep gluster
glusterfs-libs-3.5.2-1.el6.x86_64
glusterfs-cli-3.5.2-1.el6.x86_64
glusterfs-geo-replication-3.5.2-1.el6.x86_64
glusterfs-3.5.2-1.el6.x86_64
glusterfs-fuse-3.5.2-1.el6.x86_64
glusterfs-server-3.5.2-1.el6.x86_64
glusterfs-api-3.5.2-1.el6.x86_64

Comment 9 Lalatendu Mohanty 2014-08-07 06:53:34 UTC
Looks like the fix is not working as expected. Hence moving the bug to assigned state.

Comment 10 Niels de Vos 2014-08-07 08:47:03 UTC
I guess it could fail in case not all packages have been updated yet. There could be some library mismatches of some kind.

Instead of doing the kill+restart in %post, it may be safer to do it in %posttrans (or whatever the name is)?

Comment 11 Fabian Arrotin 2014-09-22 08:04:12 UTC
Just to add that I've had the same issue : updating four nodes from 3.5.0 to 3.5.2 and they all suffered from the same problem.
As I already had this problem in the past, I've just done one node at a time and restarted glusterd myself after each node update.

Comment 12 Niels de Vos 2014-09-22 12:43:53 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 13 Lalatendu Mohanty 2014-09-25 12:05:19 UTC
Looks like the fix is working in 3.6.0. I performed the following steps and found that glusterd did not get killed

1. Installed 3.6.0-0.2.beta1
2. Started glusterd
3. updated to 3.6.0-0.3.beta2
4. glusterd was still running. 

[root@dhcp159-233 yum.repos.d]# yum install glusterfs-server
xxxxxxxxxxxxxxxxxxxxxxxxxx                                                                                                                                                      6/6 
Installed:
  glusterfs-server.x86_64 0:3.6.0-0.2.beta1.fc20                                                                                                                                                                   

Dependency Installed:
  glusterfs.x86_64 0:3.6.0-0.2.beta1.fc20              glusterfs-api.x86_64 0:3.6.0-0.2.beta1.fc20         glusterfs-cli.x86_64 0:3.6.0-0.2.beta1.fc20         glusterfs-fuse.x86_64 0:3.6.0-0.2.beta1.fc20        
  glusterfs-libs.x86_64 0:3.6.0-0.2.beta1.fc20        

Complete!
[root@dhcp159-233 yum.repos.d]# systemctl start glusterd
[root@dhcp159-233 yum.repos.d]# systemctl status glusterd
glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
   Active: active (running) since Thu 2014-09-25 07:39:24 EDT; 6s ago
  Process: 29489 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid (code=exited, status=0/SUCCESS)
 Main PID: 29490 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─29490 /usr/sbin/glusterd -p /var/run/glusterd.pid

Sep 25 07:39:24 dhcp159-233.sbu.lab.eng.bos.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
Sep 25 07:39:26 dhcp159-233.sbu.lab.eng.bos.redhat.com python[29497]: SELinux is preventing /usr/sbin/glusterfsd from write access on the sock_file .
                                                                      
                                                                      *****  Plugin catchall (100. confidence) suggests   **************************...
Hint: Some lines were ellipsized, use -l to show in full.
[root@dhcp159-233 yum.repos.d]# vi glusterfs-360beta2-fedora.repo 

[root@dhcp159-233 yum.repos.d]# yum update
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Updated:
  glusterfs.x86_64 0:3.6.0-0.3.beta2.fc20             glusterfs-api.x86_64 0:3.6.0-0.3.beta2.fc20           glusterfs-cli.x86_64 0:3.6.0-0.3.beta2.fc20        glusterfs-fuse.x86_64 0:3.6.0-0.3.beta2.fc20       
  glusterfs-libs.x86_64 0:3.6.0-0.3.beta2.fc20        glusterfs-server.x86_64 0:3.6.0-0.3.beta2.fc20       

Complete!


[root@dhcp159-233 yum.repos.d]# systemctl status glusterd
glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
   Active: active (running) since Thu 2014-09-25 07:40:16 EDT; 4s ago
  Process: 29562 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid (code=exited, status=0/SUCCESS)
 Main PID: 29563 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─29563 /usr/sbin/glusterd -p /var/run/glusterd.pid

Sep 25 07:40:16 dhcp159-233.sbu.lab.eng.bos.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
Sep 25 07:40:17 dhcp159-233.sbu.lab.eng.bos.redhat.com python[29569]: SELinux is preventing /usr/sbin/glusterfsd from write access on the sock_file .
                                                                      
                                                                      *****  Plugin catchall (100. confidence) suggests   **************************...
Hint: Some lines were ellipsized, use -l to show in full.

Comment 14 Lalatendu Mohanty 2014-09-25 12:34:58 UTC
After I installed psmisc "yum install psmisc". The can reproduce the issue i.e. glusterd killed after update.

Comment 15 Anand Avati 2014-09-25 13:36:52 UTC
REVIEW: http://review.gluster.org/8857 (glusterfs.spec.in: add psmisc to -server subpackage) posted (#1) for review on release-3.6 by Kaleb KEITHLEY (kkeithle)

Comment 16 Lalatendu Mohanty 2014-09-25 14:36:11 UTC
Created attachment 941092 [details]
glusterd log

Comment 17 Lalatendu Mohanty 2014-09-25 14:43:25 UTC
Comment on attachment 941092 [details]
glusterd log

Log is from an update where glusterd was running, and after the update glusterd was not running any more.

Comment 18 Niels de Vos 2014-09-25 15:24:39 UTC
figured out what the issue is:

1. glusterd is happily running
2. yum update

   3. glusterd gets killed by the update
   4. "glusterd --xlator-option *.upgrade=on -N" is run to update the config
   5. glusterd exits when the config update is done
   6. a cond-restart or try-restart is done, but glusterd is not running, so
      nothing gets (re)started

7. "yum update" finishes


The obvious solution is to start glusterd again after it has updated its config.

Comment 19 Anand Avati 2014-09-25 15:25:59 UTC
REVIEW: http://review.gluster.org/8857 (glusterfs.spec.in: add psmisc to -server subpackage) posted (#2) for review on release-3.6 by Kaleb KEITHLEY (kkeithle)

Comment 20 Anand Avati 2014-09-25 16:01:28 UTC
REVIEW: http://review.gluster.org/8857 (glusterfs.spec.in: add psmisc to -server subpackage) posted (#3) for review on release-3.6 by Kaleb KEITHLEY (kkeithle)

Comment 21 Anand Avati 2014-09-25 16:19:10 UTC
REVIEW: http://review.gluster.org/8857 (glusterfs.spec.in: add psmisc to -server subpackage) posted (#4) for review on release-3.6 by Kaleb KEITHLEY (kkeithle)

Comment 22 Anand Avati 2014-09-26 12:45:08 UTC
REVIEW: http://review.gluster.org/8871 (glusterfs.spec.in: add psmisc to -server subpackage) posted (#1) for review on release-3.6 by Kaleb KEITHLEY (kkeithle)

Comment 23 Anand Avati 2014-09-26 16:57:19 UTC
COMMIT: http://review.gluster.org/8871 committed in release-3.6 by Vijay Bellur (vbellur) 
------
commit dce9e79a7a7ccd5b998ca562ee026e4cfd5519c2
Author: Kaleb S. KEITHLEY <kkeithle>
Date:   Fri Sep 26 08:44:10 2014 -0400

    glusterfs.spec.in: add psmisc to -server subpackage
    
    apparently some minimalist installs omit psmisc
    psmisc is needed for the killall in various %pre and %post scriptlets
    
    smarter logic for restarting glusterd in %post server
    
    Change-Id: I041f22576f25e200ff6a4e2f194e539ba7ed1d42
    BUG: 1113543
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/8871
    Reviewed-by: Niels de Vos <ndevos>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Lalatendu Mohanty <lmohanty>

Comment 24 Niels de Vos 2014-11-11 08:36:02 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users


Note You need to log in before you can comment on or make changes to this bug.