Bug 997546

Summary: when /tmp is full services fail over
Product: Red Hat Enterprise Linux 5 Reporter: Jesse Triplett <jtriplet>
Component: rgmanagerAssignee: Ryan McCabe <rmccabe>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.10CC: chenders, cluster-maint, djansa, dvossel, fdinitto, hartsjc, jharriga, mjuricek, rmccabe, sbradley
Target Milestone: rcKeywords: Reopened, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rgmanager-2.0.52-48.el5 Doc Type: Bug Fix
Doc Text:
Previously, the cluster services file system failed over from one node to another if the /tmp/ directory filled up. A patch has been provided to fix this bug and cluster services no longer fail over.
Story Points: ---
Clone Of:
: 998012 (view as bug list) Environment:
Last Closed: 2014-09-16 00:28:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 998012, 1009244, 1009245, 1009246    

Description Jesse Triplett 2013-08-15 14:42:40 UTC
Description of problem:
Cluster services fail over if /tmp gets full

Version-Release number of selected component (if applicable):
rgmanager-2.0.52-37.el5.x86_64

How reproducible:
Chris Henderson was able to reproduce this. I will copy his update on that into the first BZ comment.

Actual results:
Services failed over from one node to another when /tmp was full.

Expected results:
Service should not fail over.

Additional info:

Comment 1 Jesse Triplett 2013-08-15 14:44:33 UTC
I reproduced this on a 2 node cluster I just built. 
password=redhat

chenders~:{1026}% rssh 10.10.178.33                                                           [255]
root.178.33's password: 
Last login: Wed Aug 14 13:13:20 2013 from 10.3.113.132
[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:16:05 2013
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 vm33.gsslab.rdu2.redhat.com                           1 Online, Local, rgmanager
 dhcp95.gsslab.rdu2.redhat.com                         2 Online, rgmanager

 Service Name                             Owner (Last)                             State         
 ------- ----                             ----- ------                             -----         
 service:testservice                      dhcp95.gsslab.rdu2.redhat.com            started       

[root@vm33 ~]# clusvcadm -r testservice -m vm33.gsslab.rdu2.redhat.com
Trying to relocate service:testservice to vm33.gsslab.rdu2.redhat.com...Success
service:testservice is now running on vm33.gsslab.rdu2.redhat.com
[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:16:56 2013
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 vm33.gsslab.rdu2.redhat.com                           1 Online, Local, rgmanager
 dhcp95.gsslab.rdu2.redhat.com                         2 Online, rgmanager

 Service Name                             Owner (Last)                             State         
 ------- ----                             ----- ------                             -----         
 service:testservice                      vm33.gsslab.rdu2.redhat.com              started       


   
[root@vm33 ~]# df /tmp
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/testvg-templv
                         99150      5669     88361   7% /tmp

[root@vm33 ~]# dd if=/dev/zero of=/tmp/bigfile
dd: writing to `/tmp/bigfile': No space left on device
186217+0 records in
186216+0 records out
95342592 bytes (95 MB) copied, 1.73068 seconds, 55.1 MB/s


root@vm33 ~]# df /tmp
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/testvg-templv
                         99150     99145         0 100% /tmp

[root@vm33 ~]# tail /var/log/messages

Aug 14 13:19:25 vm33 clurgmgrd: [2871]: <err> fs:test: /dev/mapper/testvg-testlv is not mounted on /test 
Aug 14 13:19:25 vm33 clurgmgrd[2871]: <notice> status on fs "test" returned 1 (generic error) 
Aug 14 13:19:25 vm33 clurgmgrd[2871]: <notice> Stopping service service:testservice 
Aug 14 13:19:26 vm33 clurgmgrd[2871]: <notice> Service service:testservice is recovering 
Aug 14 13:19:26 vm33 clurgmgrd[2871]: <notice> Service service:testservice is now running on member 2 

[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:20:35 2013
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 vm33.gsslab.rdu2.redhat.com                           1 Online, Local, rgmanager
 dhcp95.gsslab.rdu2.redhat.com                         2 Online, rgmanager

 Service Name                             Owner (Last)                             State         
 ------- ----                             ----- ------                             -----         
 service:testservice                      dhcp95.gsslab.rdu2.redhat.com            started

Comment 4 Chris Henderson 2013-08-16 18:23:15 UTC
Based on a quick look at /usr/share/cluster/utils/fs-lib.sh, I believe RHEL6 has this problem as well.

       typeset proc_mounts=$(mktemp /tmp/fs.proc.mounts.XXXXXX)
        cat /proc/mounts > $proc_mounts

        while read -r tmp_dev tmp_mp junka junkb junkc junkd; do
                # XXX fork/clone warning XXX
                if [ "${tmp_dev:0:1}" != "-" ]; then
                        tmp_dev="$(printf "$tmp_dev")"
                fi

                if [ -n "$tmp_dev" -a "$tmp_dev" = "$dev" ]; then
                  case $OCF_RESKEY_fstype in
                    cifs|nfs|nfs4)
                      ;;
                    *)
                      return $YES
                      ;;
                  esac
                fi

                # Mountpoint from /proc/mounts containing spaces will
                # have spaces represented in octal.  printf takes care
                # of this for us.
                tmp_mp="$(printf "$tmp_mp")"

                if [ -n "$tmp_mp" -a "$tmp_mp" = "$mp" ]; then
                        return $YES
                fi
        done < $proc_mounts
        rm -f $proc_mounts

Comment 5 Ryan McCabe 2013-08-19 13:20:27 UTC
(In reply to Chris Henderson from comment #4)
> Based on a quick look at /usr/share/cluster/utils/fs-lib.sh, I believe RHEL6
> has this problem as well.

Yeah. It's the same code in both places. I cloned the bug against RHEL6 at https://bugzilla.redhat.com/show_bug.cgi?id=998012

Comment 6 David Vossel 2013-08-19 14:33:55 UTC
This usage of copying /proc/mounts to /tmp dir has already been removed upstream, and it is already fixed in the latests 6.5 build.  Below is the commit that introduced the fix upstream.  

https://github.com/ClusterLabs/resource-agents/commit/4d57e9cf453cbb34761d8b2e546dc4a71ba91c3c#L0R389

Comment 22 errata-xmlrpc 2014-09-16 00:28:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1207.html