Bug 768146
| Summary: | Make rgmanager ignore OCF_RA_NOT_INSTALLED on script resources during stop-on-stopped | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Adam Drew <adrew> | ||||
| Component: | rgmanager | Assignee: | Lon Hohberger <lhh> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 5.6 | CC: | ahecox, cluster-maint, cmarthal, djansa, jruemker | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 5.9 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | rgmanager-2.0.52-25.el5 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause: Rgmanager would call 'stop' on a script which is on a file system which has not been mounted, so the call fails.
Consequence: Resources which fail to stop cause the whole service to go in to the 'failed' state, preventing the cluster from further recovery.
Fix: Make rgmanager treat 'missing' during the 'stop' phase when we know that there never was a previous 'start' be non-fatal.
Result: Service no longer enters the failed state as a result of the script being missing.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 853251 (view as bug list) | Environment: | |||||
| Last Closed: | 2012-02-21 06:19:51 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 772956, 853251 | ||||||
| Attachments: |
|
||||||
|
Description
Adam Drew
2011-12-15 21:20:29 UTC
Created attachment 547484 [details]
Upstream patch that resolves the issue
Note: Not actually upstream yet. Verified:
<service name="script-test">
<ip address="192.168.122.233" monitor_link="1">
<fs name="fs" type="ext3" device="/dev/mapper/mpath1p1" mountpoint="/mnt/ext3" force_unmount="1">
<script name="script" file="/mnt/ext3/test.sh"/>
</fs>
</ip>
</service>
rgmanager-2.0.52-21.el5:
Dec 16 10:39:47 node1 clurgmgrd[29921]: <notice> status on ip "192.168.122.233" returned 1 (generic error)
Dec 16 10:39:47 node1 clurgmgrd[29921]: <notice> Stopping service service:script-test
Dec 16 10:39:47 node1 clurgmgrd: [29921]: <info> Executing /mnt/ext3/test.sh stop
Dec 16 10:39:47 node1 logger: ended
Dec 16 10:39:47 node1 clurgmgrd: [29921]: <info> unmounting /mnt/ext3
Dec 16 10:39:47 node1 multipathd: dm-4: umount map (uevent)
Dec 16 10:39:48 node1 clurgmgrd[29921]: <notice> Service service:script-test is recovering
Dec 16 10:39:48 node1 clurgmgrd[29921]: <notice> Recovering failed service service:script-test
Dec 16 10:39:48 node1 clurgmgrd[29921]: <notice> start on ip "192.168.122.233" returned 1 (generic error)
Dec 16 10:39:48 node1 clurgmgrd[29921]: <warning> #68: Failed to start service:script-test; return value: 1
Dec 16 10:39:48 node1 clurgmgrd[29921]: <notice> Stopping service service:script-test
Dec 16 10:39:48 node1 clurgmgrd[29921]: <notice> stop on script "script" returned 5 (program not installed)
Dec 16 10:39:48 node1 clurgmgrd: [29921]: <info> /dev/mapper/mpath1p1 is not mounted
Dec 16 10:39:48 node1 clurgmgrd[29921]: <crit> #12: RG service:script-test failed to stop; intervention required
Dec 16 10:39:48 node1 clurgmgrd[29921]: <notice> Service service:script-test is failed
Dec 16 10:39:48 node1 clurgmgrd[29921]: <crit> #13: Service service:script-test failed to stop cleanly
With test package:
Dec 16 10:45:07 node1 clurgmgrd[2502]: <notice> status on ip "192.168.122.233" returned 1 (generic error)
Dec 16 10:45:07 node1 clurgmgrd[2502]: <notice> Stopping service service:script-test
Dec 16 10:45:07 node1 clurgmgrd: [2502]: <info> Executing /mnt/ext3/test.sh stop
Dec 16 10:45:07 node1 logger: ended
Dec 16 10:45:08 node1 clurgmgrd: [2502]: <info> unmounting /mnt/ext3
Dec 16 10:45:08 node1 multipathd: dm-4: umount map (uevent)
Dec 16 10:45:08 node1 clurgmgrd[2502]: <notice> Service service:script-test is recovering
Dec 16 10:45:08 node1 clurgmgrd[2502]: <notice> Recovering failed service service:script-test
Dec 16 10:45:08 node1 clurgmgrd[2502]: <notice> start on ip "192.168.122.233" returned 1 (generic error)
Dec 16 10:45:08 node1 clurgmgrd[2502]: <warning> #68: Failed to start service:script-test; return value: 1
Dec 16 10:45:08 node1 clurgmgrd[2502]: <notice> Stopping service service:script-test
Dec 16 10:45:09 node1 clurgmgrd: [2502]: <info> /dev/mapper/mpath1p1 is not mounted
Dec 16 10:45:09 node1 clurgmgrd[2502]: <notice> Service service:script-test is recovering
Dec 16 10:45:09 node1 clurgmgrd[2502]: <warning> #71: Relocating failed service service:script-test
Dec 16 10:45:13 node1 clurgmgrd[2502]: <notice> Service service:script-test is now running on member 2
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause: Rgmanager would call 'stop' on a script which is on a file system which has not been mounted, so the call fails.
Consequence: Resources which fail to stop cause the whole service to go in to the 'failed' state, preventing the cluster from further recovery.
Fix: Make rgmanager treat 'missing' during the 'stop' phase when we know that there never was a previous 'start' be non-fatal.
Result: Service no longer enters the failed state as a result of the script being missing.
Customer verifies that the test package resolves their issue. IP resource fails as a result of link failure, service stops successfully, goes into recovery and fails again, stops successfully again, and relocates to another node. I've also verified this behavior with rgmanager-2.0.52-25.el5 in my test environment. -John Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0163.html |