Bug 728086
Summary: | fs-lib.sh doesn't handle mount error other than $?=1 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Etsuji Nakai <enakai> | ||||
Component: | resource-agents | Assignee: | Fabio Massimo Di Nitto <fdinitto> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.1 | CC: | agk, cfeist, cluster-maint, cmarthal, fdinitto, lhh, mjuricek | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | resource-agents-3.9.2-11.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: fs-lib.sh resource agent library was ignoring errors other than '1'
Consequence: When a mount returned an error other than 1 (such as an iScsi mount) fs-lib.sh thought it worked properly
Fix: make fs-lib.sh recognize other errors
Result: fs-lib.sh now recognizes all errors and fails properly.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-20 14:38:40 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 756082 | ||||||
Attachments: |
|
Nonzero return codes should be treated as errors, according to the mount man page. Also, it appears that your patch would work. Basically, if mount fails, the resource agent should return a failure -- this is for all values of failure, not just '1'. In this case, the device is missing, and mount returned the generic '32' error code for a failed mount, which was not handled. This should be simple to fix. https://github.com/ClusterLabs/resource-agents/commit/ba09b94555d7c3b899e989b456cdbe1ee1b267ac Available in rhel6-fixes branch upstream. As for testing, I don“t have a setup to trigger an error != 1 at the moment but the patch is easy enough and tested in netfs.sh code. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: fs-lib.sh resource agent library was ignoring errors other than '1' Consequence: When a mount returned an error other than 1 (such as an iScsi mount) fs-lib.sh thought it worked properly Fix: make fs-lib.sh recognize other errors Result: fs-lib.sh now recognizes all errors and fails properly. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0947.html |
Created attachment 516605 [details] Suggested patch for /usr/share/cluster/utils/fs-lib.sh Description of problem: The customer uses High Availability Add-On cluster with the iSCSI shared disk. When iSCSI disk access fails, the active nodes repeats stopping and starting the service forever. The resource definition of the filesystem on the shared disk is as below: <fs device="/dev/sdb" fstype="ext4" mountpoint="/data01" name="data_fs"/> The root cause of the problem is that when rgmanager tries to restart the service, even though mounting the filesystem fails with the return code 32, /usr/share/cluster/utils/fs-lib.sh doesn't recognize it as an error. Version-Release number of selected component (if applicable): resource-agents-3.0.12-22.el6.x86_64 rgmanager-3.0.12-11.el6_1.1.x86_64 How reproducible: Steps to Reproduce: 1.Configure a cluster with iSCSI shared disk and create a filesystem resource on it. Do not use qdisk. 2.Emulate the disk path error by blocking the iSCSI access with the iptables on the active cluster node. # iptables -A INPUT -m tcp -p tcp --sport 3260 -j REJECT Actual results: rgmanager repeats stopping and starting the service forever. Expected results: The service is relocated to the other node. Additional info: See the attachment for the suggested patch to /usr/share/cluster/utils/fs-lib.sh. It catches the all non-zero return codes as an error when mounting the filesystem. In my lab cluster, it successfully relocated the service. However, I'm not sure whether it's good to handle ALL non-zero return codes as an error.