Bug 1026513 - File systems that are offline or not available are still being reported as UP
File systems that are offline or not available are still being reported as UP
Status: CLOSED CURRENTRELEASE
Product: JBoss Operations Network
Classification: JBoss
Component: Plugin -- Other (Show other bugs)
JON 3.1.2,JON 3.2
Unspecified Unspecified
unspecified Severity high
: DR01
: JON 3.2.1
Assigned To: Thomas Segismont
Mike Foley
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-04 15:52 EST by Larry O'Leary
Modified: 2014-05-08 13:43 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-05-08 13:43:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch that fixes this issue (1.05 KB, patch)
2013-11-04 15:52 EST, Larry O'Leary
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 522363 None None None Never

  None (edit)
Description Larry O'Leary 2013-11-04 15:52:53 EST
Created attachment 819347 [details]
Proposed patch that fixes this issue

Description of problem:
If a file system is unmounted or removed, the agent continues to report it as UP. This results in unavailable file-systems continuing to be reported as available and monitoring alerts not being triggered as expected.

Version-Release number of selected component (if applicable):
3.2.0.ER3 (build c0742ed:90dd474)

How reproducible:
Always

Steps to Reproduce:
1.  Mount existing file system in a test location:

        sudo mkdir -p /mnt/testBoot
        sudo mount /dev/sda1 /mnt/testBoot

2.  Install JBoss ON system.
3.  Import platform into inventory.
4.  Set the availability schedule for the platform's file system /mnt/testBoot to 1 minute.
5.  Verify /mnt/testBoot resource is reported as UP.
6.  Unmount the file system:

        sudo umount /mnt/testBoot
        
7.  Wait a couple of minutes for the availability scan to be triggered.

Actual results:
/mnt/testBoot is still reported UP even when it is no longer mounted / unavailable.

Expected results:
/mnt/testBoot should be reported as DOWN.

Additional info:
FileSystemInfo org.rhq.core.system.NativeSystemInfo.getFileSystem(String path)

Incorrectly assumes that sigar.getFileSystemMap().getMountPoint(path) returns the file system that maps to the specified mount point. However, this is not true. Instead, this method simply returns the file system mount point that contains the specified path. This means that as long as / is mounted (which is always is) all file systems will return / as their mount point if the mount is no longer valid. For example:

If /dev/sdg1 is mounted at /mnt/data the availability check for /mnt/data will result in the following call:

NativeSystemInfo.getFileSystem("/mnt/data");

If /dev/sdg1 is still mounted and available, the mount-point returned in the FileSystemInfo object will be "/mnt/data". However, if this mount is no longer available or /dev/sdg1 has gone away, the object's mount point will be returned as "/" because the path "/mnt/data" is now located under the mount point "/". The problem is, the availability check assumes that an object being returned means that the requested file system and its mount point are available even when they are not.

Based on the JavaDoc for SystemInfo, it sounds like the error here is in the availability check.

The proposed patch will only return AvailabilityType.UP if the directory name of the FileSystem is the same as the resource key. This follows the existing logic. Please note however, it seems like a very bad idea that we are tying key to mount point but that's a separate issue altogether.
Comment 1 Thomas Segismont 2014-01-30 08:38:48 EST
Fixed in master

commit c9ea7f80a610d0ce22376f76723f8445c6d9cfd7
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Thu Jan 30 14:37:33 2014 +0100
Comment 2 Stefan Negrea 2014-02-05 10:45:08 EST
Reviewed and tested the code and the change is good; the patch is now in the release branch. 


release/jon3.2.x commit:
https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=91f4b730214fad369ddd4c63e0de71f072b6a8b6
Comment 3 Simeon Pinder 2014-02-18 10:08:24 EST
Moving to ON_QA as available for testing in the following brew build:
https://brewweb.devel.redhat.com//buildinfo?buildID=336752

Note: the installed version is still JON 3.2.0.GA by design and this represents part of the payload for JON 3.2.1 also known as cumulative patch 1 for 3.2.0.GA.  How this will be delivered to customers is still being discussed.
Comment 4 Sunil Kondkar 2014-02-24 04:09:08 EST
Verified on JON 3.2.1 DR01 build (Build Number :c758688:4c03150)

Followed the steps and verified that after umount, the /mnt/testBoot is reported as Down. The alerts are being triggered correctly for goes up and goes down availability conditions.
Comment 5 Mike Foley 2014-05-08 13:43:54 EDT
JON 3.2.1 released week of 5/5/2014

Note You need to log in before you can comment on or make changes to this bug.