Created attachment 819347 [details] Proposed patch that fixes this issue Description of problem: If a file system is unmounted or removed, the agent continues to report it as UP. This results in unavailable file-systems continuing to be reported as available and monitoring alerts not being triggered as expected. Version-Release number of selected component (if applicable): 3.2.0.ER3 (build c0742ed:90dd474) How reproducible: Always Steps to Reproduce: 1. Mount existing file system in a test location: sudo mkdir -p /mnt/testBoot sudo mount /dev/sda1 /mnt/testBoot 2. Install JBoss ON system. 3. Import platform into inventory. 4. Set the availability schedule for the platform's file system /mnt/testBoot to 1 minute. 5. Verify /mnt/testBoot resource is reported as UP. 6. Unmount the file system: sudo umount /mnt/testBoot 7. Wait a couple of minutes for the availability scan to be triggered. Actual results: /mnt/testBoot is still reported UP even when it is no longer mounted / unavailable. Expected results: /mnt/testBoot should be reported as DOWN. Additional info: FileSystemInfo org.rhq.core.system.NativeSystemInfo.getFileSystem(String path) Incorrectly assumes that sigar.getFileSystemMap().getMountPoint(path) returns the file system that maps to the specified mount point. However, this is not true. Instead, this method simply returns the file system mount point that contains the specified path. This means that as long as / is mounted (which is always is) all file systems will return / as their mount point if the mount is no longer valid. For example: If /dev/sdg1 is mounted at /mnt/data the availability check for /mnt/data will result in the following call: NativeSystemInfo.getFileSystem("/mnt/data"); If /dev/sdg1 is still mounted and available, the mount-point returned in the FileSystemInfo object will be "/mnt/data". However, if this mount is no longer available or /dev/sdg1 has gone away, the object's mount point will be returned as "/" because the path "/mnt/data" is now located under the mount point "/". The problem is, the availability check assumes that an object being returned means that the requested file system and its mount point are available even when they are not. Based on the JavaDoc for SystemInfo, it sounds like the error here is in the availability check. The proposed patch will only return AvailabilityType.UP if the directory name of the FileSystem is the same as the resource key. This follows the existing logic. Please note however, it seems like a very bad idea that we are tying key to mount point but that's a separate issue altogether.
Fixed in master commit c9ea7f80a610d0ce22376f76723f8445c6d9cfd7 Author: Thomas Segismont <tsegismo> Date: Thu Jan 30 14:37:33 2014 +0100
Reviewed and tested the code and the change is good; the patch is now in the release branch. release/jon3.2.x commit: https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=91f4b730214fad369ddd4c63e0de71f072b6a8b6
Moving to ON_QA as available for testing in the following brew build: https://brewweb.devel.redhat.com//buildinfo?buildID=336752 Note: the installed version is still JON 3.2.0.GA by design and this represents part of the payload for JON 3.2.1 also known as cumulative patch 1 for 3.2.0.GA. How this will be delivered to customers is still being discussed.
Verified on JON 3.2.1 DR01 build (Build Number :c758688:4c03150) Followed the steps and verified that after umount, the /mnt/testBoot is reported as Down. The alerts are being triggered correctly for goes up and goes down availability conditions.
JON 3.2.1 released week of 5/5/2014