Bug 997546
| Summary: | when /tmp is full services fail over | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Jesse Triplett <jtriplet> | |
| Component: | rgmanager | Assignee: | Ryan McCabe <rmccabe> | |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 5.10 | CC: | chenders, cluster-maint, djansa, dvossel, fdinitto, hartsjc, jharriga, mjuricek, rmccabe, sbradley | |
| Target Milestone: | rc | Keywords: | Reopened, ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | rgmanager-2.0.52-48.el5 | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, the cluster services file system failed over from one node to another if the /tmp/ directory filled up. A patch has been provided to fix this bug and cluster services no longer fail over.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 998012 (view as bug list) | Environment: | ||
| Last Closed: | 2014-09-16 00:28:49 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 998012, 1009244, 1009245, 1009246 | |||
|
Description
Jesse Triplett
2013-08-15 14:42:40 UTC
I reproduced this on a 2 node cluster I just built.
password=redhat
chenders~:{1026}% rssh 10.10.178.33 [255]
root.178.33's password:
Last login: Wed Aug 14 13:13:20 2013 from 10.3.113.132
[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:16:05 2013
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
vm33.gsslab.rdu2.redhat.com 1 Online, Local, rgmanager
dhcp95.gsslab.rdu2.redhat.com 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:testservice dhcp95.gsslab.rdu2.redhat.com started
[root@vm33 ~]# clusvcadm -r testservice -m vm33.gsslab.rdu2.redhat.com
Trying to relocate service:testservice to vm33.gsslab.rdu2.redhat.com...Success
service:testservice is now running on vm33.gsslab.rdu2.redhat.com
[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:16:56 2013
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
vm33.gsslab.rdu2.redhat.com 1 Online, Local, rgmanager
dhcp95.gsslab.rdu2.redhat.com 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:testservice vm33.gsslab.rdu2.redhat.com started
[root@vm33 ~]# df /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/testvg-templv
99150 5669 88361 7% /tmp
[root@vm33 ~]# dd if=/dev/zero of=/tmp/bigfile
dd: writing to `/tmp/bigfile': No space left on device
186217+0 records in
186216+0 records out
95342592 bytes (95 MB) copied, 1.73068 seconds, 55.1 MB/s
root@vm33 ~]# df /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/testvg-templv
99150 99145 0 100% /tmp
[root@vm33 ~]# tail /var/log/messages
Aug 14 13:19:25 vm33 clurgmgrd: [2871]: <err> fs:test: /dev/mapper/testvg-testlv is not mounted on /test
Aug 14 13:19:25 vm33 clurgmgrd[2871]: <notice> status on fs "test" returned 1 (generic error)
Aug 14 13:19:25 vm33 clurgmgrd[2871]: <notice> Stopping service service:testservice
Aug 14 13:19:26 vm33 clurgmgrd[2871]: <notice> Service service:testservice is recovering
Aug 14 13:19:26 vm33 clurgmgrd[2871]: <notice> Service service:testservice is now running on member 2
[root@vm33 ~]# clustat
Cluster Status for amdocstest @ Wed Aug 14 13:20:35 2013
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
vm33.gsslab.rdu2.redhat.com 1 Online, Local, rgmanager
dhcp95.gsslab.rdu2.redhat.com 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:testservice dhcp95.gsslab.rdu2.redhat.com started
Based on a quick look at /usr/share/cluster/utils/fs-lib.sh, I believe RHEL6 has this problem as well.
typeset proc_mounts=$(mktemp /tmp/fs.proc.mounts.XXXXXX)
cat /proc/mounts > $proc_mounts
while read -r tmp_dev tmp_mp junka junkb junkc junkd; do
# XXX fork/clone warning XXX
if [ "${tmp_dev:0:1}" != "-" ]; then
tmp_dev="$(printf "$tmp_dev")"
fi
if [ -n "$tmp_dev" -a "$tmp_dev" = "$dev" ]; then
case $OCF_RESKEY_fstype in
cifs|nfs|nfs4)
;;
*)
return $YES
;;
esac
fi
# Mountpoint from /proc/mounts containing spaces will
# have spaces represented in octal. printf takes care
# of this for us.
tmp_mp="$(printf "$tmp_mp")"
if [ -n "$tmp_mp" -a "$tmp_mp" = "$mp" ]; then
return $YES
fi
done < $proc_mounts
rm -f $proc_mounts
(In reply to Chris Henderson from comment #4) > Based on a quick look at /usr/share/cluster/utils/fs-lib.sh, I believe RHEL6 > has this problem as well. Yeah. It's the same code in both places. I cloned the bug against RHEL6 at https://bugzilla.redhat.com/show_bug.cgi?id=998012 This usage of copying /proc/mounts to /tmp dir has already been removed upstream, and it is already fixed in the latests 6.5 build. Below is the commit that introduced the fix upstream. https://github.com/ClusterLabs/resource-agents/commit/4d57e9cf453cbb34761d8b2e546dc4a71ba91c3c#L0R389 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1207.html |