Description of problem: 4/24.1 Probe(s) assigned to system have an UNKNOWN status RHN Satellite: Disk Space Filesystem: Cannot find match for "/dev/hda2" Suite Probe(s) assigned to system have an UNKNOWN status RHN Satellite: Disk Space Virtual Filesystem: Cannot find match for "/dev/xvda2" recreate: 1. setup monitoring w/ server and clients in permissive selinux mode 2. register another satellite as a client 3. create an RHN Satellite : Disk Space probe, and push the scout config tried on two clients Device Boot Start End Blocks Id System /dev/xvda1 * 1 13 104391 83 Linux /dev/xvda2 14 5221 41833260 8e Linux LVM [root@riverraid .ssh]# Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 83 Linux /dev/hda2 14 7296 58500697+ 8e Linux LVM [root@rlx-0-10 .ssh]#
Not sure if this is user error or something wrong with the probe. I thought disk check ran '/bin/df -k' - maybe user is entering /dev/hda2 and its thus trying '/bin/df -k /dev/hda2' and failing. So, need to figure out if probe works with defaults, or if user error. To test, Satellite with Monitoring enabled, go to any registered system, give it monitoring entitlement, create a RHN Satellite :: Disk Space probe. Schedule/perform a Scout Config Push... wait for it to be completed. Troubleshooting section of Reference Guide for Monitoring lists how to run probes manually. If you get stuck, poke someone to help, inc myself if needed. Hopefully Milan or Mirek can help locally if needed :) Cliff.
This bug pertains to a satellite w/ selinux in permissive. When the server is in enforcing, you get the errors described here: https://bugzilla.redhat.com/show_bug.cgi?id=498458
This did indeed seem to function with a 5.1 Satellite: https://rlx-1-16.rhndev.redhat.com/rhn/systems/details/probes/ProbeDetails.do?probe_id=24&sid=1000010073 Cliff
I'd say, it's a misunderstanding of the "RHN Satellite" probe group. These probes get executed locally on the satellite itself, not on registered clients. So no remote connections, no rhnmd, no clients necessary - just executing the probe commands locally. Let's try to run "df -k" directly on the satellite to check if you'll get any "/dev/hda2" or "/dev/xvda2" device. I agree the RHN Satellite probes are confusing, because they are "assigned" to registered clients even if they have nothing in common with them. I created a BZ#501918 for that.
the comment in #4 is not correct. For the probe to get executed locally the satellite itself would have to be registered to itself. Registering a satellite to itself as far as I know is not supported, and not practical.
I definitely do not mean registering a satellite to itself. try to run as nocpulse user on the satellite: # rhn-runprobe --probe <your RHN Satellite: Disk Space id> --log all=3 and compare the output from the probe command (NOCpulse::Probe::Shell::Unix::read_result stdout) with # df -k You shall get the same disk space usage ...
ok.. this is working 2009-05-28 10:31:37 ============================================================ CRITICAL: Filesystem /dev/mapper/VolGroup00-LogVol00 (/): Space used 18,348 MB (above critical threshold of 85 MB); Filesystem pct used 56%; Space available 14,958 MB ============================================================ -bash-3.2$ What is confusing here is that all the probes need to have a client attached to them including RHN Satellite Probes, howerver the probe is actually executed on the satellite... bad form for us, and probably not documented. but it passes
heh.. I take that back.. it *is* documented Click create new probe and select the Satellite Probe Command Group. Next, complete the remaining fields as you would for any other probe. Refer to Section 7.5.1, “Managing Probes” for instructions. Although the RHN Server appears to be monitored by the client system, the probe is actually run from the server on itself. Thresholds and notifications work normally.
Moving to RELEASE_PENDING steps below Client: dhcp77-100 Satellite: Stage ISO 7/24 rhndev1 Created a Xen guest 77-100, gave it Monitoring entitlement, selinux is set to Permissive [root@dhcp77-100 ~]# hostname dhcp77-100.rhndev.redhat.com [root@dhcp77-100 ~]# getenforce Permissive Create a probe for disk usage on /dev/xvda1 [root@dhcp77-100 ~]# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 2380564 958776 1298908 43% / /dev/xvda1 101086 13140 82727 14% /boot tmpfs 262232 0 262232 0% /dev/shm On Satellite running Probe [nocpulse@rhndev1 ~]$ rhn-catalog 22 ServiceProbe on dhcp77-100.rhndev.redhat.com (10.10.77.100 ): Linux: Disk Usage [nocpulse@rhndev1 ~]$ rhn-runprobe --probe 22 --log all=3 2009-08-04 13:07:20 NOCpulse::Probe::Shell::Unix::connect Execute '/usr/bin/ssh -l nocpulse -p 4545 -i /var/lib/nocpulse/.ssh/nocpulse-identity -o StrictHostKeyChecking=no -o BatchMode=yes 10.10.77.100 /bin/sh -s' 2009-08-04 13:07:20 NOCpulse::Probe::Shell::Unix::connect Opened pipe to 22190 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::read_result stdout: >>>Linux#2.6.18-128.el5xen#1322 <<< 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::read_result stderr: >>><<< 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::read_result status: >>>0<<< 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::connect OS Linux 2.6.18-128.el5xen, shell pid 1322 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::connect OK 2009-08-04 13:07:21 NOCpulse::Probe::Shell::AbstractShell::run /bin/df -k 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::read_result stdout: >>>Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 2380564 958776 1298908 43% / /dev/xvda1 101086 13140 82727 14% /boot tmpfs 262232 0 262232 0% /dev/shm <<< 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::read_result stderr: >>><<< 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::read_result status: >>>0<<< 2009-08-04 13:07:21 NOCpulse::Probe::Threshold::value_crossed pctused did not cross 2009-08-04 13:07:21 NOCpulse::Probe::Threshold::value_crossed space_avail did not cross 2009-08-04 13:07:21 NOCpulse::Probe::Threshold::value_crossed space_used did not cross 2009-08-04 13:07:21 NOCpulse::Probe::Result::_calc_overall_status Overall status OK 2009-08-04 13:07:21 NOCpulse::Probe::Result::_calc_changed_items Overall status changed: NO 2009-08-04 13:07:21 NOCpulse::Probe::ItemStatus::should_notify for 'space_avail'? NO Status OK, prior OK, due for renotify '1', worse status '' Notify flags CRITICAL => OK => UNKNOWN => WARNING => 2009-08-04 13:07:21 NOCpulse::Probe::Result::_calc_notifying_items Item 'space_avail' WILL NOT notify: status OK, item thinks it should: NO, -1 elapsed seconds does trigger renotification 2009-08-04 13:07:21 NOCpulse::Probe::ItemStatus::should_notify for 'space_used'? NO Status OK, prior OK, due for renotify '1', worse status '' Notify flags CRITICAL => OK => UNKNOWN => WARNING => 2009-08-04 13:07:21 NOCpulse::Probe::Result::_calc_notifying_items Item 'space_used' WILL NOT notify: status OK, item thinks it should: NO, -1 elapsed seconds does trigger renotification 2009-08-04 13:07:21 NOCpulse::Probe::ItemStatus::should_notify for 'pctused'? NO Status OK, prior OK, due for renotify '1', worse status '' Notify flags CRITICAL => OK => UNKNOWN => WARNING => 2009-08-04 13:07:21 NOCpulse::Probe::Result::_calc_notifying_items Item 'pctused' WILL NOT notify: status OK, item thinks it should: NO, -1 elapsed seconds does trigger renotification 2009-08-04 13:07:21 NOCpulse::Probe::Result::_calc_notifying_items 0 items need notification 2009-08-04 13:07:21 NOCpulse::Probe::Result::_format_messages Message: Filesystem /dev/xvda1 (/boot): Filesystem pct used 14%; Space available 80 MB; Space used 12 MB 2009-08-04 13:07:21 No items changed 2009-08-04 13:07:21 Notification not required 2009-08-04 13:07:21 NOTE: Running in test mode; no changes saved, nothing enqueued 2009-08-04 13:07:21 ============================================================ OK: Filesystem /dev/xvda1 (/boot): Filesystem pct used 14%; Space available 80 MB; Space used 12 MB ============================================================ 2009-08-04 13:07:21 NOCpulse::Probe::Shell::Unix::disconnect Disconnecting 2009-08-04 13:07:22 NOCpulse::Probe::Shell::Unix::_kill_child Child exit code 0 Also verified, the "RHN Satellite" probe is running correctly on the Satellite itself for disk usage. [nocpulse@rhndev1 ~]$ df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 10482240 7552200 2397560 76% / /dev/dasda1 99168 15932 78120 17% /boot none 2055200 0 2055200 0% /dev/shm jwm-devel.usersys.redhat.com:/shared/export/rhndev1.z900/var/satellite 428169408 351453088 59316512 86% /var/satellite /root/Satellite-5.3.0-RHEL4-re20090724.0-s390x.iso 517774 517774 0 100% /mnt/iso [nocpulse@rhndev1 ~]$ rhn-catalog 5 ServiceProbe on fjs-0-03.rhndev.redhat.com (10.10.76.130 ): Linux: Interface Traffic 6 ServiceProbe on fjs-0-03.rhndev.redhat.com (10.10.76.130 ): General: Uptime SNMP 7 ServiceProbe on fjs-0-03.rhndev.redhat.com (10.10.76.130 ): Linux: Disk I/O Throughput 8 ServiceProbe on fjs-0-03.rhndev.redhat.com (10.10.76.130 ): Linux: CPU Usage 22 ServiceProbe on dhcp77-100.rhndev.redhat.com (10.10.77.100 ): Linux: Disk Usage 24 ServiceProbe on dhcp77-100.rhndev.redhat.com (10.10.77.100 ): RHN Satellite: Disk Space [nocpulse@rhndev1 ~]$ rhn-runprobe --probe 24 --log all=3 2009-08-04 13:12:17 NOCpulse::Probe::Shell::Unix::connect Execute '/bin/sh -s' 2009-08-04 13:12:17 NOCpulse::Probe::Shell::Unix::connect Opened pipe to 23693 2009-08-04 13:12:17 NOCpulse::Probe::Shell::Unix::read_result stdout: >>>Linux#2.6.9-89.0.3.EL#23693 <<< 2009-08-04 13:12:17 NOCpulse::Probe::Shell::Unix::read_result stderr: >>><<< 2009-08-04 13:12:17 NOCpulse::Probe::Shell::Unix::read_result status: >>>0<<< 2009-08-04 13:12:17 NOCpulse::Probe::Shell::Unix::connect OS Linux 2.6.9-89.0.3.EL, shell pid 23693 2009-08-04 13:12:17 NOCpulse::Probe::Shell::Unix::connect OK 2009-08-04 13:12:17 NOCpulse::Probe::Shell::AbstractShell::run /bin/df -k 2009-08-04 13:12:18 NOCpulse::Probe::Shell::Unix::read_result stdout: >>>Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 10482240 7552216 2397544 76% / /dev/dasda1 99168 15932 78120 17% /boot none 2055200 0 2055200 0% /dev/shm jwm-devel.usersys.redhat.com:/shared/export/rhndev1.z900/var/satellite 428169408 351453088 59316512 86% /var/satellite /root/Satellite-5.3.0-RHEL4-re20090724.0-s390x.iso 517774 517774 0 100% /mnt/iso <<< 2009-08-04 13:12:18 NOCpulse::Probe::Shell::Unix::read_result stderr: >>><<< 2009-08-04 13:12:18 NOCpulse::Probe::Shell::Unix::read_result status: >>>0<<< 2009-08-04 13:12:18 NOCpulse::Probe::Threshold::value_crossed pctused did not cross 2009-08-04 13:12:18 NOCpulse::Probe::Threshold::value_crossed space_avail did not cross 2009-08-04 13:12:18 NOCpulse::Probe::Threshold::value_crossed space_used did not cross 2009-08-04 13:12:18 NOCpulse::Probe::Result::_calc_overall_status Overall status OK 2009-08-04 13:12:18 NOCpulse::Probe::Result::_calc_changed_items First run, marking everything as changed 2009-08-04 13:12:18 NOCpulse::Probe::Result::_calc_changed_items Overall status changed: YES 2009-08-04 13:12:18 NOCpulse::Probe::Result::_calc_notifying_items 0 items need notification 2009-08-04 13:12:18 NOCpulse::Probe::Result::_format_messages Message: Filesystem /dev/dasda1 (/boot): Filesystem pct used 17%; Space available 76 MB; Space used 15 MB 2009-08-04 13:12:18 Items changed or removed: 2009-08-04 13:12:18 space_avail '76' is OK 2009-08-04 13:12:18 space_used '15' is OK 2009-08-04 13:12:18 pctused '17' is OK 2009-08-04 13:12:18 Notification not required 2009-08-04 13:12:18 NOTE: Running in test mode; no changes saved, nothing enqueued 2009-08-04 13:12:18 ============================================================ OK: Filesystem /dev/dasda1 (/boot): Filesystem pct used 17%; Space available 76 MB; Space used 15 MB ============================================================ 2009-08-04 13:12:18 NOCpulse::Probe::Shell::Unix::disconnect Disconnecting 2009-08-04 13:12:18 NOCpulse::Probe::Shell::Unix::_kill_child Child exit code 0
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1434.html