Description of problem: Monitoring probes doesn't work. Each monitoring probes is inactive. Error is "Monitoring command did not complete" Version-Release number of selected component (if applicable): sat540 = Satellite-5.4.0-RHEL5-re20100903.1 client = rhnmd-5.3.0-5.el5sat How reproducible: always Steps to Reproduce: 1. set up monitoring on sat540 + client # iptables, selinux off; ssh via nocpulse user works 2. set up some of the probes 3. push scout config Actual results: State|Probe Description|Status String|Type Probe(s) assigned to system have an UNKNOWN status|Linux: CPU Usage|Monitoring command did not complete within 15 seconds Probe(s) assigned to system have an UNKNOWN status|Linux: Load|Monitoring command did not complete within 15 seconds Probe(s) assigned to system have a CRITICAL status|Network Services: SSH|SSH port 22: connect: timeout Expected results: Additional info: I set up ssh keys, restart rhnmd, I can connect to client from satellite with command: [root@SATELLITE_FQDN ~]# /usr/bin/ssh -l nocpulse -p 4545 -i /var/lib/nocpulse/.ssh/nocpulse-identity -o StrictHostKeyChecking=no -o BatchMode=yes <IP.CLIENT.IP.CLIENT> "cd ~;pwd;hostname" /var/lib/nocpulse <HOSTNAME_OF_CLIENT>
Petr - can you provide login details for the Monitoring system your using. May help to quickly figure out why it is not working. Cliff
notes for myself, client is ufo-3.brq.redhat.com
notes for myself: it seems that scout is trying to connect to wrong ip. I created Network:Ping probe and with default value, the scout is trying to connect to 10.34.28.111, which is not ip of ufo-3. If I file optional parameter ip address, it start work. So far I tracked it that PL/SQL function rhn_server.get_ip_address returns wrong result: SQL> select rhn_server.get_ip_address(1000010004) from dual; RHN_SERVER.GET_IP_ADDRESS(1000010004) -------------------------------------------------------------------------------- 10.34.28.111 correct ip is # host ufo-3.brq.redhat.com ufo-3.brq.redhat.com has address 10.34.26.49 SQL> select ipaddr ip_addr from rhnServerNetwork where server_id = 1000010004 and ipaddr != '127.0.0.1'; IP_ADDR ---------------- 10.34.28.111 So the question is why in table rhnServerNetwork is wrong IP address. I even tried to run rhn-profile-sync, but it is still intact.
I tested if this is regression against sat530, with suspection that this will happen when client change IP address I had hard time to change IP address on one machine. So I did following I registred machine A to satellite X. On machine B I changed serverURL to satellite X and copied systemid from machine A to machine B. This should perfectly simulate change of IP address (and even hostname). In more details. 1. register machine A to satellite X 2. create ping probe 3. push scout config <---- probe is ok here 4. on machine B change server URL and copy systemid from machine A 5. suspend machine A <----- probe show 100% packet loss 6. run rhn_check on machine B <----- still 100% packet loss, but in SDC I see checkins 7. run rhn-profile-sync <----- still 100% packet loss, still trying old IP 8. push scout config <----- probe is ok here, trying new IP. I tried this setups A:RHEL5.1, B:RHEL6, X: sat530 B:RHEL6, A:RHEL5.1 X: sat530 A:RHEL5.1, B:RHEL6, X: sat540 The behavior was the same on sat530 and sa540. Which is surprising. Because even if I try to run rhn-profile-sync on ufo-3 and scout config push on satellite it still do not update that IP. Only difference I see so far is that ufo-3 is RHEL5.5. Hmm
I just tested it with RHEL5.5 and it still work as I described. I.e after rhn-profile-sync and scout config push it works as expected (with small glitch, which I reported as BZ 633975). So something has to be special on ufo-3 machine.
Oh crap, I had on that RHEL5.1 and 5.5 packages from spacewalk client repo. I will have to retest it tomorrow again with 5.5 without those spacewalk clients.
Ok. so even with proper RHEL 5.5 (without client tools from spacewalk repo) it works on both sat530 and 540. So I would like to say, that this bug happens only on ufo-3. Why? This is uknown to me right now.
I find that it is client (ufo-3) who send that wrong IP address. Backend store it correctly. The problem is in incorrect network setup of ufo-3: [root@ufo-3 ~]# python Python 2.4.3 (#1, Jun 11 2009, 14:09:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from socket import gethostname >>> from socket import gethostbyname >>> gethostname() 'ufo-3' >>> gethostbyname(gethostname()) '10.34.28.111' >>> [root@ufo-3 ~]# hostname ufo-3 [root@ufo-3 ~]# host ufo-3 ufo-3.brq.redhat.com has address 10.34.26.49 I'm closing this as NOTABUG
It is caused by: [root@ufo-3 ~]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 10.34.28.111 ufo-3
(In reply to comment #10) It was original IP of this machine, it seems that dhcp change it. Filled bug against python: Bug 634147 - gethostbyname(gethostname()) is wrong when IP is changed
This is nonsense, client do not have to have open single port. So there is no port we can try to connect. And workarounding others bug or setup is nonsense.