Description of problem: netdump often can't get the information it needs from the arping utility in order to use a netdump/syslog server that's on a different network (i.e., which must be accessed through a router). Version-Release number of selected component (if applicable): netdump-0.6.11-2 (and note that this seems to fail more reliably on RHAS 2.1 than RHEL3, but it happens on both of them) How reproducible: set SYSLOGADDR to a non-local destination server in /etc/sysconfig/netdump. Note that this does seem to work sometimes, but usually not. In our environment I'm trying to set SYSLOGADDR to a server that's just one router hop away, and it's failing on all RHEL3 hosts. Actual results: netdump: cannot arp a.b.c.d netdump: cannot find a.b.c.d in arp cache syslog server address resolution [FAILED] Expected results: netdump starts up successfully Additional info: Even setting SYSLOGMACADDR explicitly to the first-hop router's MAC address is not enough to get around this bug. The way I'm working around this for now was to set SYSLOGMACADDR, and implement the following change in /etc/init.d/netdump (this is for syslog, but an analogous change can be made to fix netdump itself as well): ---- 8< ------------------------------------------------------ @@ -189,10 +190,11 @@ # syslogd server, if any if [ -n "$SYSLOGADDR" ] ; then eval $(print_address_info $SYSLOGADDR) - [ "$SERVER_ADDRESS_RESOLUTION" = "unresolved" ] && - netdump_failure "syslog server address resolution" +# [ "$SERVER_ADDRESS_RESOLUTION" = "unresolved" ] && +# netdump_failure "syslog server address resolution" [ -z "$SYSLOGMACADDR" ] && SYSLOGMACADDR=$MAC - SYSLOGIPHEX=`dquad_to_hex $IPADDR` +# SYSLOGIPHEX=`dquad_to_hex $IPADDR` + SYSLOGIPHEX=`dquad_to_hex $SYSLOGADDR` eval $(echo $SYSLOGMACADDR | sed "s/:/ /g" | ( read M0 M1 M2 M3 M4 M5; echo M0=$M0\; M1=$M1\; M2=$M2\; M3=$M3\; M4=$M4\; M5=$M5\; )) ---- 8< ------------------------------------------------------ In other words, force the netdump script to actually *believe* the user when they specify SYSLOGADDR and SYSLOGMACADDR. This is not a suggested patch, it was just a quick hack to get the damn thing working. The long-term solution would be to avoid the use of "arping" altogether if xxxxMACADDR is set, and use the user- specified IP address and rather than the one calculated through the print_address_info call.
Please try the version of netdump included in the RHEL 3 Update 5 beta, versioned 0.7.7-2, and let us know if this resolves your problem. We've included changes to the script surrounding the detection of the next hop for netdump clients and servers separated by a router.
Using the new 0.7.7-2 RPM, netdump is failing on a server that's setting only SYSLOGADDR to point at a syslog server that's separated from this server by a router. This configuration used to work fine under the old RPM, but here's what happens now when I try to start netdump: # service netdump start netdump: cannot arp automount initializing netdump /lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsol e.o: invalid argument syntax for syslog_target_eth_byte0: 'x' /lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: insmod /lib/modu les/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o failed /lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: insmod netconsol e failed [FAILED] And here's the associated syslog output: Jul 1 10:22:00 ndclient netdump:: inserting netconsole module with arguments magic1=0x11111111 magic2=0x11111111 dev=eth0 source_port=6666 syslog_target _ip=0x00000000 syslog_target_port=514 syslog_target_eth_byte0=0x syslog_target_eth_byte1=0x syslog_target_eth_byte2=0x syslog_target_eth_byte3=0x syslog_target_eth_byte4=0x syslog_target_eth_byte5=0x Jul 1 10:22:00 ndclient modprobe: /lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: invalid argument syntax for syslog_target_eth_byte0: 'x' Jul 1 10:22:00 ndclient modprobe: /lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: insmod /lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o failed Jul 1 10:22:00 ndclient modprobe: /lib/modules/2.4.21-32.4.ELsmp/kernel/drivers/net/netconsole.o: insmod netconsole failed Jul 1 10:22:00 ndclient netdump: initializing netdump failed
BTW, setting SYSLOGMACADDR in addition to SYSLOGADDR changes the results in various ways, but netdump still fails to start.
Please send in the contents of your /etc/sysconfig/netdump file. Judging by the output in comment #2, SYSLOGMACADDR is never getting set even though the above errors fail to note that the syslog mac address was never resolved. Could you also please try this by specifying the ip address of the syslog server in /etc/sysconfig/netdump, rather than the host name? The script may be having problems understanding hostnames vs. ip addresses. Also, A few updates ago you mentioned you were using netdump-7.7-2. We're up to 7.14-3 at this point, and it would be helpful to me if you would upgrade to that version so we could work with the same code
I don't know what the specific contents of the /etc/sysconfig/netdump file were in comment #2, but I do know that I was only setting SYSLOGADDR. And I've never used anything but an IP address and/or MAC address (no hostnames)--wouldn't even consider doing otherwise. So the non-comment contents would have just been (e.g.) SYSLOGADDR=10.1.2.3. As you requested, I've just retested this by setting only SYSLOGADDR on a server that's separated from the syslog server in question by a single router hop, using netdump 0.7.14-4 on a system running RHEL4 (kernel 2.6.9-22.0.1). And it appeared to work--netdump correctly determined the MAC address of the first-hop router that's associated with the specified SYLOGADDR IP address, and it started up with no errors. And the associated netconsole/netlog startup was logged to the syslog server specified by SYSLOGADDR. So it looks as though this bug may have been corrected in this netdump/kernel combination.
BTW, I just tried setting SYSLOGADDR on other servers where the syslog server is further than one router hop away, and netdump failed to start. However, when I then set SYSLOGMACADDR the first-hop router's MAC address, netdump accepted it and started with no problems. So apparently it is now trusting the user when they specify SYSLOGMACADDR as well. Much better. I haven't actually tested with the netdump protocol itself (i.e. by setting NETDUMPADDR), but based on the syslog results I'm willing to say that this bug is resolved and open another one if I run into problems later.