Bug 142587

Summary: netdump won't start up with a syslog (or netdump) server that's behind a router
Product: Red Hat Enterprise Linux 3 Reporter: John Caruso <jcaruso>
Component: netdumpAssignee: Neil Horman <nhorman>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: jvandenhengel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 0.7.14-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-07 17:36:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Caruso 2004-12-10 19:35:46 UTC
Description of problem:
netdump often can't get the information it needs from the arping 
utility in order to use a netdump/syslog server that's on a different 
network (i.e., which must be accessed through a router).

Version-Release number of selected component (if applicable):
netdump-0.6.11-2 (and note that this seems to fail more reliably on 
RHAS 2.1 than RHEL3, but it happens on both of them)

How reproducible:
set SYSLOGADDR to a non-local destination server 
in /etc/sysconfig/netdump.  Note that this does seem to work 
sometimes, but usually not.  In our environment I'm trying to set 
SYSLOGADDR to a server that's just one router hop away, and it's 
failing on all RHEL3 hosts.

Actual results:
netdump: cannot arp a.b.c.d
netdump: cannot find a.b.c.d in arp cache
syslog server address resolution                           [FAILED]

Expected results:
netdump starts up successfully

Additional info:
Even setting SYSLOGMACADDR explicitly to the first-hop router's MAC 
address is not enough to get around this bug.  The way I'm working 
around this for now was to set SYSLOGMACADDR, and implement the 
following change in /etc/init.d/netdump (this is for syslog, but an 
analogous change can be made to fix netdump itself as well):

---- 8< ------------------------------------------------------
@@ -189,10 +190,11 @@
     # syslogd server, if any
     if [ -n "$SYSLOGADDR" ] ; then
        eval $(print_address_info $SYSLOGADDR)
-        [ "$SERVER_ADDRESS_RESOLUTION" = "unresolved" ] &&
-            netdump_failure "syslog server address resolution"
+#        [ "$SERVER_ADDRESS_RESOLUTION" = "unresolved" ] &&
+#            netdump_failure "syslog server address resolution"
        [ -z "$SYSLOGMACADDR" ] && SYSLOGMACADDR=$MAC
-       SYSLOGIPHEX=`dquad_to_hex $IPADDR`
+#      SYSLOGIPHEX=`dquad_to_hex $IPADDR`
+       SYSLOGIPHEX=`dquad_to_hex $SYSLOGADDR`
        eval $(echo $SYSLOGMACADDR | sed "s/:/ /g" | ( read M0 M1 M2 
M3 M4 M5;
               echo M0=$M0\; M1=$M1\; M2=$M2\; M3=$M3\; M4=$M4\; 
M5=$M5\; ))
---- 8< ------------------------------------------------------

In other words, force the netdump script to actually *believe* the 
user when they specify SYSLOGADDR and SYSLOGMACADDR.  This is not a 
suggested patch, it was just a quick hack to get the damn thing 
working.  The long-term solution would be to avoid the use 
of "arping" altogether if xxxxMACADDR is set, and use the user-
specified IP address and rather than the one calculated through the 
print_address_info call.

Comment 1 Jeff Moyer 2005-04-11 20:44:03 UTC
Please try the version of netdump included in the RHEL 3 Update 5 beta,
versioned 0.7.7-2, and let us know if this resolves your problem.  We've
included changes to the script surrounding the detection of the next hop for
netdump clients and servers separated by a router.
  

Comment 2 John Caruso 2005-07-01 17:26:59 UTC
Using the new 0.7.7-2 RPM, netdump is failing on a server that's setting only
SYSLOGADDR to point at a syslog server that's separated from this server by a
router.  This configuration used to work fine under the old RPM, but here's what
happens now when I try to start netdump:

# service netdump start
netdump: cannot arp automount
initializing netdump /lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsol
e.o: invalid argument syntax for syslog_target_eth_byte0: 'x'
/lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: insmod /lib/modu
les/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o failed
/lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: insmod netconsol
e failed
                                                           [FAILED]

And here's the associated syslog output:

Jul  1 10:22:00 ndclient netdump:: inserting netconsole module with arguments
magic1=0x11111111 magic2=0x11111111 dev=eth0 source_port=6666 syslog_target
_ip=0x00000000 syslog_target_port=514 syslog_target_eth_byte0=0x
syslog_target_eth_byte1=0x syslog_target_eth_byte2=0x syslog_target_eth_byte3=0x
syslog_target_eth_byte4=0x syslog_target_eth_byte5=0x
Jul  1 10:22:00 ndclient modprobe:
/lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: invalid argument
syntax for syslog_target_eth_byte0: 'x'
Jul  1 10:22:00 ndclient modprobe:
/lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: insmod
/lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o failed
Jul  1 10:22:00 ndclient modprobe:
/lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: insmod
netconsole failed
Jul  1 10:22:00 ndclient netdump: initializing netdump failed


Comment 3 John Caruso 2005-07-01 17:31:20 UTC
BTW, setting SYSLOGMACADDR in addition to SYSLOGADDR changes the results in
various ways, but netdump still fails to start.


Comment 4 Neil Horman 2006-06-21 12:24:06 UTC
Please send in the contents of your /etc/sysconfig/netdump file. Judging by the
output in comment #2, SYSLOGMACADDR is never getting set even though the above
errors fail to note that the syslog mac address was never resolved.

Could you also please try this by specifying the ip address of the syslog server
in /etc/sysconfig/netdump, rather than the host name?  The script may be having
problems understanding hostnames vs. ip addresses.

Also, A few updates ago you mentioned you were using netdump-7.7-2.  We're up to
7.14-3 at this point, and it would be helpful to me if you would upgrade to that
version so we could work with the same code

Comment 5 John Caruso 2006-07-03 21:02:57 UTC
I don't know what the specific contents of the /etc/sysconfig/netdump file were
in comment #2, but I do know that I was only setting SYSLOGADDR.  And I've never
used anything but an IP address and/or MAC address (no hostnames)--wouldn't even
consider doing otherwise.  So the non-comment contents would have just been
(e.g.) SYSLOGADDR=10.1.2.3.

As you requested, I've just retested this by setting only SYSLOGADDR on a server
that's separated from the syslog server in question by a single router hop,
using netdump 0.7.14-4 on a system running RHEL4 (kernel 2.6.9-22.0.1).  And it
appeared to work--netdump correctly determined the MAC address of the first-hop
router that's associated with the specified SYLOGADDR IP address, and it started
up with no errors.  And the associated netconsole/netlog startup was logged to
the syslog server specified by SYSLOGADDR.

So it looks as though this bug may have been corrected in this netdump/kernel
combination.


Comment 6 John Caruso 2006-07-03 21:19:01 UTC
BTW, I just tried setting SYSLOGADDR on other servers where the syslog server is
further than one router hop away, and netdump failed to start.  However, when I
then set SYSLOGMACADDR the first-hop router's MAC address, netdump accepted it
and started with no problems.  So apparently it is now trusting the user when
they specify SYSLOGMACADDR as well.  Much better.

I haven't actually tested with the netdump protocol itself (i.e. by setting
NETDUMPADDR), but based on the syslog results I'm willing to say that this bug
is resolved and open another one if I run into problems later.