997546 – when /tmp is full services fail over

Bug 997546 - when /tmp is full services fail over

Summary: when /tmp is full services fail over

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rgmanager
Sub Component:
Version:	5.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Ryan McCabe
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	998012 1009244 1009245 1009246
TreeView+	depends on / blocked

Reported:	2013-08-15 14:42 UTC by Jesse Triplett
Modified:	2018-12-03 19:38 UTC (History)
CC List:	10 users (show)
Fixed In Version:	rgmanager-2.0.52-48.el5
Doc Type:	Bug Fix
Doc Text:	Previously, the cluster services file system failed over from one node to another if the /tmp/ directory filled up. A patch has been provided to fix this bug and cluster services no longer fail over.
Clone Of:
Clones:	998012 (view as bug list)
Environment:
Last Closed:	2014-09-16 00:28:49 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	454393	0	None	None	None	Never
Red Hat Product Errata	RHBA-2014:1207	0	normal	SHIPPED_LIVE	rgmanager bug fix and enhancement update	2014-09-16 04:16:39 UTC

Description Jesse Triplett 2013-08-15 14:42:40 UTC

Description of problem:
Cluster services fail over if /tmp gets full

Version-Release number of selected component (if applicable):
rgmanager-2.0.52-37.el5.x86_64

How reproducible:
Chris Henderson was able to reproduce this. I will copy his update on that into the first BZ comment.

Actual results:
Services failed over from one node to another when /tmp was full.

Expected results:
Service should not fail over.

Additional info:

Comment 1 Jesse Triplett 2013-08-15 14:44:33 UTC

I reproduced this on a 2 node cluster I just built. 
password=redhat

chenders~:{1026}% rssh 10.10.178.33                                                           [255]
root.178.33's password: 
Last login: Wed Aug 14 13:13:20 2013 from 10.3.113.132
[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:16:05 2013
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 vm33.gsslab.rdu2.redhat.com                           1 Online, Local, rgmanager
 dhcp95.gsslab.rdu2.redhat.com                         2 Online, rgmanager

 Service Name                             Owner (Last)                             State         
 ------- ----                             ----- ------                             -----         
 service:testservice                      dhcp95.gsslab.rdu2.redhat.com            started       

[root@vm33 ~]# clusvcadm -r testservice -m vm33.gsslab.rdu2.redhat.com
Trying to relocate service:testservice to vm33.gsslab.rdu2.redhat.com...Success
service:testservice is now running on vm33.gsslab.rdu2.redhat.com
[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:16:56 2013
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 vm33.gsslab.rdu2.redhat.com                           1 Online, Local, rgmanager
 dhcp95.gsslab.rdu2.redhat.com                         2 Online, rgmanager

 Service Name                             Owner (Last)                             State         
 ------- ----                             ----- ------                             -----         
 service:testservice                      vm33.gsslab.rdu2.redhat.com              started       


   
[root@vm33 ~]# df /tmp
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/testvg-templv
                         99150      5669     88361   7% /tmp

[root@vm33 ~]# dd if=/dev/zero of=/tmp/bigfile
dd: writing to `/tmp/bigfile': No space left on device
186217+0 records in
186216+0 records out
95342592 bytes (95 MB) copied, 1.73068 seconds, 55.1 MB/s


root@vm33 ~]# df /tmp
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/testvg-templv
                         99150     99145         0 100% /tmp

[root@vm33 ~]# tail /var/log/messages

Aug 14 13:19:25 vm33 clurgmgrd: [2871]: <err> fs:test: /dev/mapper/testvg-testlv is not mounted on /test 
Aug 14 13:19:25 vm33 clurgmgrd[2871]: <notice> status on fs "test" returned 1 (generic error) 
Aug 14 13:19:25 vm33 clurgmgrd[2871]: <notice> Stopping service service:testservice 
Aug 14 13:19:26 vm33 clurgmgrd[2871]: <notice> Service service:testservice is recovering 
Aug 14 13:19:26 vm33 clurgmgrd[2871]: <notice> Service service:testservice is now running on member 2 

[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:20:35 2013
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 vm33.gsslab.rdu2.redhat.com                           1 Online, Local, rgmanager
 dhcp95.gsslab.rdu2.redhat.com                         2 Online, rgmanager

 Service Name                             Owner (Last)                             State         
 ------- ----                             ----- ------                             -----         
 service:testservice                      dhcp95.gsslab.rdu2.redhat.com            started

Comment 4 Chris Henderson 2013-08-16 18:23:15 UTC

Based on a quick look at /usr/share/cluster/utils/fs-lib.sh, I believe RHEL6 has this problem as well.

       typeset proc_mounts=$(mktemp /tmp/fs.proc.mounts.XXXXXX)
        cat /proc/mounts > $proc_mounts

        while read -r tmp_dev tmp_mp junka junkb junkc junkd; do
                # XXX fork/clone warning XXX
                if [ "${tmp_dev:0:1}" != "-" ]; then
                        tmp_dev="$(printf "$tmp_dev")"
                fi

                if [ -n "$tmp_dev" -a "$tmp_dev" = "$dev" ]; then
                  case $OCF_RESKEY_fstype in
                    cifs|nfs|nfs4)
                      ;;
                    *)
                      return $YES
                      ;;
                  esac
                fi

                # Mountpoint from /proc/mounts containing spaces will
                # have spaces represented in octal.  printf takes care
                # of this for us.
                tmp_mp="$(printf "$tmp_mp")"

                if [ -n "$tmp_mp" -a "$tmp_mp" = "$mp" ]; then
                        return $YES
                fi
        done < $proc_mounts
        rm -f $proc_mounts

Comment 5 Ryan McCabe 2013-08-19 13:20:27 UTC

(In reply to Chris Henderson from comment #4)
> Based on a quick look at /usr/share/cluster/utils/fs-lib.sh, I believe RHEL6
> has this problem as well.

Yeah. It's the same code in both places. I cloned the bug against RHEL6 at https://bugzilla.redhat.com/show_bug.cgi?id=998012

Comment 6 David Vossel 2013-08-19 14:33:55 UTC

This usage of copying /proc/mounts to /tmp dir has already been removed upstream, and it is already fixed in the latests 6.5 build.  Below is the commit that introduced the fix upstream.  

https://github.com/ClusterLabs/resource-agents/commit/4d57e9cf453cbb34761d8b2e546dc4a71ba91c3c#L0R389

Comment 22 errata-xmlrpc 2014-09-16 00:28:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1207.html

Note You need to log in before you can comment on or make changes to this bug.