Red Hat Bugzilla – Bug 1026513
File systems that are offline or not available are still being reported as UP
Last modified: 2014-05-08 13:43:54 EDT
Created attachment 819347 [details]
Proposed patch that fixes this issue
Description of problem:
If a file system is unmounted or removed, the agent continues to report it as UP. This results in unavailable file-systems continuing to be reported as available and monitoring alerts not being triggered as expected.
Version-Release number of selected component (if applicable):
3.2.0.ER3 (build c0742ed:90dd474)
Steps to Reproduce:
1. Mount existing file system in a test location:
sudo mkdir -p /mnt/testBoot
sudo mount /dev/sda1 /mnt/testBoot
2. Install JBoss ON system.
3. Import platform into inventory.
4. Set the availability schedule for the platform's file system /mnt/testBoot to 1 minute.
5. Verify /mnt/testBoot resource is reported as UP.
6. Unmount the file system:
sudo umount /mnt/testBoot
7. Wait a couple of minutes for the availability scan to be triggered.
/mnt/testBoot is still reported UP even when it is no longer mounted / unavailable.
/mnt/testBoot should be reported as DOWN.
FileSystemInfo org.rhq.core.system.NativeSystemInfo.getFileSystem(String path)
Incorrectly assumes that sigar.getFileSystemMap().getMountPoint(path) returns the file system that maps to the specified mount point. However, this is not true. Instead, this method simply returns the file system mount point that contains the specified path. This means that as long as / is mounted (which is always is) all file systems will return / as their mount point if the mount is no longer valid. For example:
If /dev/sdg1 is mounted at /mnt/data the availability check for /mnt/data will result in the following call:
If /dev/sdg1 is still mounted and available, the mount-point returned in the FileSystemInfo object will be "/mnt/data". However, if this mount is no longer available or /dev/sdg1 has gone away, the object's mount point will be returned as "/" because the path "/mnt/data" is now located under the mount point "/". The problem is, the availability check assumes that an object being returned means that the requested file system and its mount point are available even when they are not.
Based on the JavaDoc for SystemInfo, it sounds like the error here is in the availability check.
The proposed patch will only return AvailabilityType.UP if the directory name of the FileSystem is the same as the resource key. This follows the existing logic. Please note however, it seems like a very bad idea that we are tying key to mount point but that's a separate issue altogether.
Fixed in master
Author: Thomas Segismont <firstname.lastname@example.org>
Date: Thu Jan 30 14:37:33 2014 +0100
Reviewed and tested the code and the change is good; the patch is now in the release branch.
Moving to ON_QA as available for testing in the following brew build:
Note: the installed version is still JON 3.2.0.GA by design and this represents part of the payload for JON 3.2.1 also known as cumulative patch 1 for 3.2.0.GA. How this will be delivered to customers is still being discussed.
Verified on JON 3.2.1 DR01 build (Build Number :c758688:4c03150)
Followed the steps and verified that after umount, the /mnt/testBoot is reported as Down. The alerts are being triggered correctly for goes up and goes down availability conditions.
JON 3.2.1 released week of 5/5/2014