Bug 1323547 - [RFE] resource agent request to monitor FC HBA or multipathd
Summary: [RFE] resource agent request to monitor FC HBA or multipathd
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents
Version: 7.2
Hardware: All
OS: All
low
low
Target Milestone: rc
: ---
Assignee: Oyvind Albrigtsen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-04 02:46 UTC by jajeon
Modified: 2016-05-25 15:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-25 15:00:48 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description jajeon 2016-04-04 02:46:33 UTC
Currently there seems no available resource agent that can monitor FC HBA status. (such as link failure, HBA card failure...ETC)
Similar to "ethmonitor", resource that can monitor FC HBA would be required on large systems.

Already there are LVM2 or Filesystem resource which can do similar functionality but this not monitoring FC HBA itself.

Hence, requesting resource agent that can monitor FC HBA.

Comment 3 John Ruemker 2016-04-05 15:23:06 UTC
>> Already there are LVM2 or Filesystem resource which can do similar functionality but this not monitoring FC HBA itself.

Those resources would be indirectly monitoring the status of the FC links.  If your link to the storage device is disrupted in some way, then any I/O issued to the device should present an error, possibly after some amount of waiting to timeout or possibly immediately if there is a clear error condition, but in either case you should get an error.  If you have multiple links aggregated under some sort of multipath device, then it will be monitoring success/failure of I/O issued over the individual links and will take action to reroute failed I/O's when needed, thereby lessening the chance that any such I/O will fail entirely.  If you get to a point where all paths to a device have failed, then I/O to that device should either receive an error in response, or be queued (block) until a path returns and the I/O succeeds.

The point is: if something goes wrong with your storage links, the LVM and Filesystem resource agents as well as other application agents, should all be able to detect it through their regular monitoring that issues I/O to these devices, and this usually makes any direct FC-HBA monitoring unnecessary.  If you have an application running on top of the LV or filesystem and that app is managed by the cluster, then I/O errors that make their way back up to that application and cause it to fail may also produce an error when the resource-agent for that app performs a monitor.  

So, while there may be a use case out there where having direct FC-HBA monitoring may be useful, its not immediately obvious what that use case is.  Saying that "LVM or Filesystem can do similar" as justification for needing another agent to "failover over resources" when there is an FC-link failure ignores the fact that LVM and Filesystem already achieve this through their own monitoring.  As such, it would be great if we could get more detail about what you or the customer feels is not entirely covered by the current offerings, or what they'd like to achieve that they cannot already. 

Also, it would be great if we could discuss this in a support case with the customer directly, so we are clearly understanding their needs and can communicate recommendations or status back to them.

Thanks,
John

Comment 4 John Ruemker 2016-04-14 14:10:32 UTC
From email, the target use case is for environments directly using a LUN as a raw/direct-access device without any file system or LVM volumes on it.

~~~
Simply Use case is monitor FC HBA but not to failover Filesystem or LVM that lives on top of SAN environment.

Refer to "Use case 4" of URL below which explains exact customer's scenario.
"
http://www.novell.com/docrep/2012/01/sap_on_sle_simple_stack.PDF
2.5 Use Case 4 “Enqueue Replication High Availability External Database”
"
For this case, using Filesystem or LVM requires extra resources such as configuring GFS or requires additional LUN.
~~~


Note You need to log in before you can comment on or make changes to this bug.