Bug 997546
Summary: | when /tmp is full services fail over | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jesse Triplett <jtriplet> | |
Component: | rgmanager | Assignee: | Ryan McCabe <rmccabe> | |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 5.10 | CC: | chenders, cluster-maint, djansa, dvossel, fdinitto, hartsjc, jharriga, mjuricek, rmccabe, sbradley | |
Target Milestone: | rc | Keywords: | Reopened, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | rgmanager-2.0.52-48.el5 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, the cluster services file system failed over from one node to another if the /tmp/ directory filled up. A patch has been provided to fix this bug and cluster services no longer fail over.
|
Story Points: | --- | |
Clone Of: | ||||
: | 998012 (view as bug list) | Environment: | ||
Last Closed: | 2014-09-16 00:28:49 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 998012, 1009244, 1009245, 1009246 |
Description
Jesse Triplett
2013-08-15 14:42:40 UTC
I reproduced this on a 2 node cluster I just built. password=redhat chenders~:{1026}% rssh 10.10.178.33 [255] root.178.33's password: Last login: Wed Aug 14 13:13:20 2013 from 10.3.113.132 [root@vm33 ~]# clustat Cluster Status for amdocstest @ Wed Aug 14 13:16:05 2013 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vm33.gsslab.rdu2.redhat.com 1 Online, Local, rgmanager dhcp95.gsslab.rdu2.redhat.com 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:testservice dhcp95.gsslab.rdu2.redhat.com started [root@vm33 ~]# clusvcadm -r testservice -m vm33.gsslab.rdu2.redhat.com Trying to relocate service:testservice to vm33.gsslab.rdu2.redhat.com...Success service:testservice is now running on vm33.gsslab.rdu2.redhat.com [root@vm33 ~]# clustat Cluster Status for amdocstest @ Wed Aug 14 13:16:56 2013 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vm33.gsslab.rdu2.redhat.com 1 Online, Local, rgmanager dhcp95.gsslab.rdu2.redhat.com 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:testservice vm33.gsslab.rdu2.redhat.com started [root@vm33 ~]# df /tmp Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/testvg-templv 99150 5669 88361 7% /tmp [root@vm33 ~]# dd if=/dev/zero of=/tmp/bigfile dd: writing to `/tmp/bigfile': No space left on device 186217+0 records in 186216+0 records out 95342592 bytes (95 MB) copied, 1.73068 seconds, 55.1 MB/s root@vm33 ~]# df /tmp Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/testvg-templv 99150 99145 0 100% /tmp [root@vm33 ~]# tail /var/log/messages Aug 14 13:19:25 vm33 clurgmgrd: [2871]: <err> fs:test: /dev/mapper/testvg-testlv is not mounted on /test Aug 14 13:19:25 vm33 clurgmgrd[2871]: <notice> status on fs "test" returned 1 (generic error) Aug 14 13:19:25 vm33 clurgmgrd[2871]: <notice> Stopping service service:testservice Aug 14 13:19:26 vm33 clurgmgrd[2871]: <notice> Service service:testservice is recovering Aug 14 13:19:26 vm33 clurgmgrd[2871]: <notice> Service service:testservice is now running on member 2 [root@vm33 ~]# clustat Cluster Status for amdocstest @ Wed Aug 14 13:20:35 2013 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vm33.gsslab.rdu2.redhat.com 1 Online, Local, rgmanager dhcp95.gsslab.rdu2.redhat.com 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:testservice dhcp95.gsslab.rdu2.redhat.com started Based on a quick look at /usr/share/cluster/utils/fs-lib.sh, I believe RHEL6 has this problem as well. typeset proc_mounts=$(mktemp /tmp/fs.proc.mounts.XXXXXX) cat /proc/mounts > $proc_mounts while read -r tmp_dev tmp_mp junka junkb junkc junkd; do # XXX fork/clone warning XXX if [ "${tmp_dev:0:1}" != "-" ]; then tmp_dev="$(printf "$tmp_dev")" fi if [ -n "$tmp_dev" -a "$tmp_dev" = "$dev" ]; then case $OCF_RESKEY_fstype in cifs|nfs|nfs4) ;; *) return $YES ;; esac fi # Mountpoint from /proc/mounts containing spaces will # have spaces represented in octal. printf takes care # of this for us. tmp_mp="$(printf "$tmp_mp")" if [ -n "$tmp_mp" -a "$tmp_mp" = "$mp" ]; then return $YES fi done < $proc_mounts rm -f $proc_mounts (In reply to Chris Henderson from comment #4) > Based on a quick look at /usr/share/cluster/utils/fs-lib.sh, I believe RHEL6 > has this problem as well. Yeah. It's the same code in both places. I cloned the bug against RHEL6 at https://bugzilla.redhat.com/show_bug.cgi?id=998012 This usage of copying /proc/mounts to /tmp dir has already been removed upstream, and it is already fixed in the latests 6.5 build. Below is the commit that introduced the fix upstream. https://github.com/ClusterLabs/resource-agents/commit/4d57e9cf453cbb34761d8b2e546dc4a71ba91c3c#L0R389 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1207.html |