Bug 555901
Summary: | fs.sh can kill processes that are not on the mount point which is being unmounted | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Shane Bradley <sbradley> | ||||||||||||
Component: | rgmanager | Assignee: | Lon Hohberger <lhh> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||
Priority: | urgent | ||||||||||||||
Version: | 4 | CC: | bmr, cluster-maint, djansa, fnadge, iannis, jkortus, jwest, rbinkhor, rrajaram, tao, tdunnon | ||||||||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | rgmanager-1.9.87-1.4.el4 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: |
Previously, the file system agent could kill a process when an application used a mount point with a similar name to a mount point managed by rgmanager using force_unmount. With this update, the file system agent kills only the processes that access the mount point managed by rgmanager.
|
Story Points: | --- | ||||||||||||
Clone Of: | |||||||||||||||
: | 582754 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2011-02-16 15:08:24 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 485811, 572246, 572248, 582754 | ||||||||||||||
Attachments: |
|
Description
Shane Bradley
2010-01-15 21:12:56 UTC
Created attachment 384713 [details]
Patch to fix the killing incorrect process
Created attachment 384714 [details]
fs.sh patch applied for RHEL4
This patched fs.sh was only tested on RHEL4, so not sure about RHEL5 but should be close to same.
I tested the reproducer outlined in summary of BZ in RHEL5 with the patched fs.sh that lon gave me for RHEL5 and it PASSED. I tested the reproducer outlined in summary of BZ in RHEL4 with the patched fs.sh that lon gave me for RHEL4 and it PASSED. --sbradley This patch has additional side effect that it does not kill processes directly on the service mountpoint. Try this: <rm> <resources> <clusterfs device="/dev/vedder/vedder0" force_unmount="1" self_fence="0" fstype="gfs" mountpoint="/mnt/vedder0" name="vedderfs" options=""/> </resources> <service autostart="1" name="jkservice"> <clusterfs ref="vedderfs"/> </service> </rm> then run bash on /mnt/vedder0 and ignore the signals: trap "" SIGTERM Now the service migration will fail: Apr 13 12:29:26 z2 clurgmgrd: [27562]: <notice> Forcefully unmounting /mnt/vedder0 Apr 13 12:29:30 z2 clurgmgrd: [27562]: <notice> Forcefully unmounting /mnt/vedder0 Apr 13 12:29:31 z2 clurgmgrd: [27562]: <err> 'umount /mnt/vedder0' failed, error=0 Apr 13 12:29:31 z2 clurgmgrd[27562]: <notice> stop on clusterfs "vedderfs" returned 2 (invalid argument(s)) Apr 13 12:29:31 z2 clurgmgrd[27562]: <crit> #12: service:jkservice failed to stop; intervention required Apr 13 12:29:31 z2 clurgmgrd[27562]: <notice> Service jkservice is failed The root cause is probably regex at line 733, where it expects that all processes in lsof have the mountpoint listed with "/" at the end which is not the case for processes running directly on the mountpoint. Tested on 4.8.z version (rgmanager-1.9.87-1.el4_8.3) but others are probably affected as well. I think maybe a better approach to this whole thing is to drop lsof support and just use 'fuser -kvm'. Created attachment 406649 [details]
Patch to use fuser instead.
Created attachment 406650 [details]
Patched fs.sh
Created attachment 406652 [details]
Automatic test case.
This test case requires:
- gcc
- fs.sh
Copy it in to /usr/share/cluster
cd /usr/share/cluster
./555901-test.sh
Updated build addresses clusterfs/netfs.sh force_unmount holes. Tested on clusterfs and fs. All processess accessing the mountpoints were killed and none of the others (including those running in the example given by reporter) were killed. reverting the state back, It's working but no errata yet. Wrong bz# :/. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, the file system agent could kill a process when an application used a mount point with a similar name to a mount point managed by rgmanager using force_unmount. With this update, the file system agent kills only the processes that access the mount point managed by rgmanager. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0264.html |