Bug 728086

Summary: fs-lib.sh doesn't handle mount error other than $?=1
Product: Red Hat Enterprise Linux 6 Reporter: Etsuji Nakai <enakai>
Component: resource-agentsAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 6.1CC: agk, cfeist, cluster-maint, cmarthal, fdinitto, lhh, mjuricek
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: resource-agents-3.9.2-11.el6 Doc Type: Bug Fix
Doc Text:
Cause: fs-lib.sh resource agent library was ignoring errors other than '1' Consequence: When a mount returned an error other than 1 (such as an iScsi mount) fs-lib.sh thought it worked properly Fix: make fs-lib.sh recognize other errors Result: fs-lib.sh now recognizes all errors and fails properly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 14:38:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 756082    
Attachments:
Description Flags
Suggested patch for /usr/share/cluster/utils/fs-lib.sh none

Description Etsuji Nakai 2011-08-04 03:21:57 UTC
Created attachment 516605 [details]
Suggested patch for /usr/share/cluster/utils/fs-lib.sh

Description of problem:
The customer uses High Availability Add-On cluster with the iSCSI shared disk. When iSCSI disk access fails, the active nodes repeats stopping and starting the service forever.

The resource definition of the filesystem on the shared disk is as below:
<fs device="/dev/sdb" fstype="ext4" mountpoint="/data01" name="data_fs"/>

The root cause of the problem is that when rgmanager tries to restart the service, even though mounting the filesystem fails with the return code 32, /usr/share/cluster/utils/fs-lib.sh doesn't recognize it as an error.


Version-Release number of selected component (if applicable):
resource-agents-3.0.12-22.el6.x86_64
rgmanager-3.0.12-11.el6_1.1.x86_64


How reproducible:
Steps to Reproduce:
1.Configure a cluster with iSCSI shared disk and create a filesystem resource on it. Do not use qdisk.

2.Emulate the disk path error by blocking the iSCSI access with the iptables on the active cluster node.
# iptables -A INPUT -m tcp -p tcp --sport 3260 -j REJECT

  
Actual results:
rgmanager repeats stopping and starting the service forever.

Expected results:
The service is relocated to the other node.

Additional info:
See the attachment for the suggested patch to /usr/share/cluster/utils/fs-lib.sh. It catches the all non-zero return codes as an error when mounting the filesystem. In my lab cluster, it successfully relocated the service. However, I'm not sure whether it's good to handle ALL non-zero return codes as an error.

Comment 2 Lon Hohberger 2011-08-10 16:04:39 UTC
Nonzero return codes should be treated as errors, according to the mount man page.

Also, it appears that your patch would work.

Comment 3 Lon Hohberger 2011-08-10 16:05:38 UTC
Basically, if mount fails, the resource agent should return a failure -- this is for all values of failure, not just '1'.  In this case, the device is missing, and mount returned the generic '32' error code for a failed mount, which was not handled.

This should be simple to fix.

Comment 8 Fabio Massimo Di Nitto 2012-02-27 12:33:38 UTC
https://github.com/ClusterLabs/resource-agents/commit/ba09b94555d7c3b899e989b456cdbe1ee1b267ac

Available in rhel6-fixes branch upstream.

Comment 10 Fabio Massimo Di Nitto 2012-02-27 12:58:20 UTC
As for testing, I donĀ“t have a setup to trigger an error != 1 at the moment but the patch is easy enough and tested in netfs.sh code.

Comment 14 Chris Feist 2012-04-30 21:47:50 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: fs-lib.sh resource agent library was ignoring errors other than '1'

Consequence: When a mount returned an error other than 1 (such as an iScsi mount) fs-lib.sh thought it worked properly

Fix: make fs-lib.sh recognize other errors

Result: fs-lib.sh now recognizes all errors and fails properly.

Comment 16 errata-xmlrpc 2012-06-20 14:38:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0947.html