Bug 728086

Summary:

fs-lib.sh doesn't handle mount error other than $?=1

Product:

Red Hat Enterprise Linux 6

Reporter:

Etsuji Nakai <enakai>

Component:

resource-agents

Assignee:

Fabio Massimo Di Nitto <fdinitto>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

high

Docs Contact:

Priority:

medium

Version:

6.1

CC:

agk, cfeist, cluster-maint, cmarthal, fdinitto, lhh, mjuricek

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

resource-agents-3.9.2-11.el6

Doc Type:

Bug Fix

Doc Text:

Cause: fs-lib.sh resource agent library was ignoring errors other than '1' Consequence: When a mount returned an error other than 1 (such as an iScsi mount) fs-lib.sh thought it worked properly Fix: make fs-lib.sh recognize other errors Result: fs-lib.sh now recognizes all errors and fails properly.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-06-20 14:38:40 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

756082

Attachments:

Description	Flags
Suggested patch for /usr/share/cluster/utils/fs-lib.sh	none

Description Etsuji Nakai 2011-08-04 03:21:57 UTC

Created attachment 516605 [details]
Suggested patch for /usr/share/cluster/utils/fs-lib.sh

Description of problem:
The customer uses High Availability Add-On cluster with the iSCSI shared disk. When iSCSI disk access fails, the active nodes repeats stopping and starting the service forever.

The resource definition of the filesystem on the shared disk is as below:
<fs device="/dev/sdb" fstype="ext4" mountpoint="/data01" name="data_fs"/>

The root cause of the problem is that when rgmanager tries to restart the service, even though mounting the filesystem fails with the return code 32, /usr/share/cluster/utils/fs-lib.sh doesn't recognize it as an error.


Version-Release number of selected component (if applicable):
resource-agents-3.0.12-22.el6.x86_64
rgmanager-3.0.12-11.el6_1.1.x86_64


How reproducible:
Steps to Reproduce:
1.Configure a cluster with iSCSI shared disk and create a filesystem resource on it. Do not use qdisk.

2.Emulate the disk path error by blocking the iSCSI access with the iptables on the active cluster node.
# iptables -A INPUT -m tcp -p tcp --sport 3260 -j REJECT

  
Actual results:
rgmanager repeats stopping and starting the service forever.

Expected results:
The service is relocated to the other node.

Additional info:
See the attachment for the suggested patch to /usr/share/cluster/utils/fs-lib.sh. It catches the all non-zero return codes as an error when mounting the filesystem. In my lab cluster, it successfully relocated the service. However, I'm not sure whether it's good to handle ALL non-zero return codes as an error.

Comment 2 Lon Hohberger 2011-08-10 16:04:39 UTC

Nonzero return codes should be treated as errors, according to the mount man page.

Also, it appears that your patch would work.

Comment 3 Lon Hohberger 2011-08-10 16:05:38 UTC

Basically, if mount fails, the resource agent should return a failure -- this is for all values of failure, not just '1'.  In this case, the device is missing, and mount returned the generic '32' error code for a failed mount, which was not handled.

This should be simple to fix.

Comment 8 Fabio Massimo Di Nitto 2012-02-27 12:33:38 UTC

https://github.com/ClusterLabs/resource-agents/commit/ba09b94555d7c3b899e989b456cdbe1ee1b267ac

Available in rhel6-fixes branch upstream.

Comment 10 Fabio Massimo Di Nitto 2012-02-27 12:58:20 UTC

As for testing, I don´t have a setup to trigger an error != 1 at the moment but the patch is easy enough and tested in netfs.sh code.

Comment 14 Chris Feist 2012-04-30 21:47:50 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: fs-lib.sh resource agent library was ignoring errors other than '1'

Consequence: When a mount returned an error other than 1 (such as an iScsi mount) fs-lib.sh thought it worked properly

Fix: make fs-lib.sh recognize other errors

Result: fs-lib.sh now recognizes all errors and fails properly.

Comment 16 errata-xmlrpc 2012-06-20 14:38:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0947.html