Bug 90803 (IT_44337)
Summary: | /etc/init.d/netdump start script requires client to be on same subnet as server | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Allen Nuttle <anuttle> |
Component: | netdump | Assignee: | Jeff Moyer <jmoyer> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | jmoyer, juanino, nhruby, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2005-113 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-05-20 00:12:26 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 132991 | ||
Attachments: |
Description
Allen Nuttle
2003-05-14 00:09:29 UTC
Created attachment 91655 [details]
/etc/init.d/netdump modifications
Created attachment 102386 [details]
netdump-mac-subnet.patch
unified diff version of the above patch, against the RHEL3 netdump.
Jeff, can you take a look at this? > One solution would be to copy the information from the regular stack > sometime before it starts to process a crash. That is precisely what we are doing by getting this information either from arp or from the value supplied in /etc/sysconfig/netdump. > The simplest workaround is to specify a MAC address in > the "/etc/sysconfig/netdump" file, but this is also broken in this > case! That is not a workaround. It is a "functions as designed" thing. Further, it is not broken in the configurations that I've tested. I use this feature of netdump daily. > This preserves NETDUMPADDR in the case where arping/arp fail and > makes the problem solvable, by manually supplying the proper MAC > address in "/etc/sysconfig/netdump", for NETDUMPMACADDR. If arp fails, we exit from the script in print_address_info. I'm not sure what problem you are addressing with this patch. Perhaps you could supply more detail about the configuration that isn't working for you? The idea of using traceroute for finding the gateway automatically sounds worth exploring. This may be worth incorporating in the next version of netdump. Thanks. The problem is that arp[ing] only works when the address is either in the arp cache or reachable through a ethernet-layer (layer 2) broadcast. There are some slight exceptions, such as when proxy arp is in use. However, the bottom line is that, in general, either someone manually adds an address to the arp cache for a host that is on another subnet/broadcast domain, or arp does not work. This is normally fine, since any address that is not specified by the subnet mask is automatically contacted through the gateway. However, netdump is not normal in that it does not use the standard IP stack, so it does not pick up this automatic indirection -- it needs to have the right MAC address configured, either manually or, better, automatically. As things stand in the version we use, both of these options are broken. I would not be surprised if at least the manual configuration option has been fixed, but this is still a pain. If you are not seeing this problem, the most likely reason is that either you are in an environment where netdump traffic does not need to cross any subnet boundary -- or someone has configured your routers and/or proxies to transparently forward arp broadcasts and responses. This is not the way things operate in general in the wide world and netdump is broken in environments that do not do this -- and this is not just a theoretical possibility. Here is a quick high-level overview of ARP I found just now that explains this: <http://www.mynetwatchman.com/pckidiot/chap05.htm>. Here's what I would do to try to reproduce this: /sbin/ifconfig (find "addr" and "Mask") /sbin/route -n (find a "Gateway" entry, where "Flags" includes 'G') ping at least one address that is on the same subnet as "addr" (in other words, an address that differs from "addr" only in bits that are zero in "Mask") and at least one address on a different subnet /sbin/arp -n -a note that the address on the same subnet has an arp entry, while the address on a different subnet does not; also notice that the "Gateway" used to reach the second address has an entry If you're saying that manual configuration now works and is required in this type of network environment, I'd still call this a bug and encourage you to fix it at the next opportunity. ping_output="$(ping -c 1 -I $DEV $host 2> /dev/null | \ grep '^PING ' | awk '{print $3}' | sed 's#^(##' | sed 's#)$##')" [ $? -ne 0 ] && echo "$prog: cannot ping $host" 1>&2 && usage I believe the return value from the ping_output line will be the result of the last command evaluated in the pipeline. So, this does not check that ping failed. This needs fixing. + trc_output="$(traceroute -i $DEV -n -m 1 $host_ip 2> /dev/null | \ + grep '^ 1 ' | awk '{print $2}')" + [ $? -ne 0 ] && echo "$prog: cannot traceroute $host_ip" 1>&2 && usage Same here. for line in $arp_output; do IFS=$oldIFS set - foo $line shift - if [ "$2" = "($host)" ] || expr "$1" : "$host" &>/dev/null; then - echo HOSTNAME=$1 IPADDR=$2 AT=$3 MAC=$4 \ + if [ "$2" = "($mac_ip)" ] || expr "$1" : "$mac_ip" &>/dev/null; then + echo HOSTNAME=$1 IPADDR=$host_ip MAC_IPADDR=$2 AT=$3 MAC=$4 \ TYPE=$5 ON=$6 IFACE=$7 This bit won't apply anymore. My main concern with this patch is that we don't introduce regressions, not even in the error cases. Please create a new patch against the latest source, netdump-0.7.5, and if you could generate diffs against the source tree, that would be ideal (i.e. not against /etc/init.d/netdump and some file netdump in whatever directory). Please also note that it has been mentioned that some switches hide the first hop. I want to ensure that the hard-coded case will still work in this case. To that end, if anyone watching this bugzilla has a network with foundry switches deployed, please let me know if you can volunteer to test. Thanks, Jeff Created attachment 111129 [details]
netdump-subnets.patch
Updated patch. Untested (as I don't have the hardware to do so here).
Created attachment 111136 [details]
Slight corrections to prior version of the netdump-subnets patch
This breaks my setup. I believe our routers use proxy arp. Previously, I could specify the IP address of my netdump server (which is on the other side of the router), and this would work fine. With your patch applied, this no longer works. I'll look into this further. -Jeff Created attachment 111444 [details]
Find the next hop MAC address automatically
Alan, please take a look at this version of the patch. Honestly, I don't see
how the last version of the patch would have worked for anything other than
client and server on the same subnet.
Thanks.
Jeff
Ugh. the usage function doesn't return: [ $? -ne 0 ] && echo "$prog: cannot ping $host" 1>&2 && usage So you really want to exit the script at this point. Not only that, the script was called with the proper arguments, but the configuration was incorrect. Thus, telling the user that they need to call the script with start|stop|status is not helpful in this case. Created attachment 111448 [details]
Gets rid of bogus Usage calls.
Okay, this patch gets rid of the calls to usage. After looking at the code
again, it's apparent that you don't want to exit in these cases. This version
passes a number of regression tests in my environment. Any testing by others
would be greatly appreciated.
Sorry -- I should have taken more time with this, I just did a quick review and smoke_test of the last patch. Thanks for following up! The most recent patch looks good to me and worked in our environment -- this time I configured and also actually triggered a dump :). OK, thanks for the testing, Alan. I'll work to get this into our next update. A fix for this has been committed to netdump, and is on track for RHEL 3 U5 and RHEL 4 U1. Packages versioned 0.7.7-2 and later have this fix. I'm having trouble getting this patch to apply to my netdump install. Is there any way I could get the packaged version. patching file netdump Hunk #1 FAILED at 73. 1 out of 1 hunk FAILED -- saving rejects to file netdump.rej You can get the latest version from anonymous cvs: export CVSROOT=:pserver:anonymous.com:/usr/local/CVS cvs -z3 login (hit enter) cvs -z3 co netdump Indeed this works. I'm not able to reproduce this with any of our testlab networks, probably due to having routers passing arp requests or something. But it appears at least one person had success with the patched packages. Will try with a less smart networking setup once I get into the lab. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-451.html |