Bug 602325
Summary: | kdump to network target fails over bridge device | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Maley <dmaley> | ||||||||
Component: | kexec-tools | Assignee: | Cong Wang <amwang> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Chao Ye <cye> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 6.0 | CC: | amwang, cye, jboggs, jolsa, nhorman, phan, qcai, rkhan, tao, tgraf, vbenes | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | 6.0 | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | kexec-tools-2_0_0-120_el6 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-11-15 14:29:31 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 506995, 578501 | ||||||||||
Attachments: |
|
Created attachment 428763 [details]
Proposed patch
This works, but unfortunately udhcpc fails to get a dynamic IP for br0 and eth0 finally. I can't fingure out if this is a problem of udhcpc itself.
(In reply to comment #1) > This works, but unfortunately udhcpc fails to get a dynamic IP for br0 and eth0 > finally. I can't fingure out if this is a problem of udhcpc itself. This problem is due to our network configuration inside RH, we shouldn't setup br0 and eth0 with DHCP at the same time. So I think the patch is OK. Hi, please attach your ifcfg-br0 and ifcfg-eth0, and do they work well in the first kernel with 'service network restart'? Event posted on 07-21-2010 04:29pm JST by mfuruta This event sent from IssueTracker by mfuruta issue 959923 it_file 882273 Hmm, still like a problem of udhcpc... Seems like a bug of udhcpc, it doesn't bind port 68, causes an ICMP unreachable error. I don't know why this only happens on bridge. What makes you think that udhcpc doesn't bind to port 68? Why would it be binding to a port at all. dhcp clients use raw sockets when requesting network addresses, binding to a port in that state is meaningless. Looking at the log, I'd say this is actually your problem: mapping br0 to lo For some reason mkdumprd has gotten confused and thinks that br0 should actually be lo (the loopback interface). Thats what needs fixing. (In reply to comment #12) > What makes you think that udhcpc doesn't bind to port 68? Why would it be > binding to a port at all. dhcp clients use raw sockets when requesting network > addresses, binding to a port in that state is meaningless. Sorry, I mean open port 68, not bind, the ICMP unreachable error said this port is not open. > > Looking at the log, I'd say this is actually your problem: > mapping br0 to lo > > For some reason mkdumprd has gotten confused and thinks that br0 should > actually be lo (the loopback interface). Thats what needs fixing. Where? I didn't see this on my machine. The log above says: mapping br0 to br0 mapping eth0 to eth0 udhcpc (v1.15.1) started Sending discover... Sending discover... Sending discover... No lease, failing br0 failed to come up This problem can also be reproduced manually in the first kernel, by running 'busybox udhcpc br0' manually. Created attachment 435180 [details]
related tcpdump output
ah, well, thats different. It seems like the origional reporter and you are seeing (at least in part) different issues then. The tcpdump shows that you're getting dhcp offer replies from the server, from which we can assume that we've successuly sent a dhcp discover message (although the tcpdump doesn't show that). If we're not sending a a dhcp request message in response to those offers, my first assumption would be that something in the network stack is dropping those frames, perhaps iptables rules? That would be odd however, as iptables rules wouldn't be in effect in the second kernel. Solved, we need to set eth0 in promiscuous mode in this case. Fixed in kexec-tools-2_0_0-138_el6. Tested with -142.el6 on hp-ml370g4-01.rhts.eng.bos.redhat.com. It's very strange, maybe dump success, maybe failed to dump. I tried more than six times, only success two times. =============================================================================== [root@hp-ml370g4-01 ~]# rpm -q kexec-tools kexec-tools-2.0.0-142.el6.i686 [root@hp-ml370g4-01 ~]# tail /etc/kdump.conf #core_collector cp --sparse=always #link_delay 60 #kdump_post /var/crash/scripts/kdump-post.sh #extra_bins /usr/bin/lftp #disk_timeout 30 #extra_modules gfs2 #options modulename options #default shell net nest.test.redhat.com:/mnt/qa link_delay 60 [root@hp-ml370g4-01 ~]# touch /etc/kdump.conf [root@hp-ml370g4-01 ~]# service kdump restart Stopping kdump: [ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.32-66.el6.i686kdump.img Netmask is missed! Starting kdump: [ OK ] [root@hp-ml370g4-01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=dhcp ONBOOT=yes [root@hp-ml370g4-01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=none ONBOOT=yes BRIDGE=br0 -------------------------------------------------------------------------------- Failed console output: Free memory/Total memory (free %): 94912 / 118644 ( 79.9973 ) Scanning logical volumes Reading all physical volumes. This may take a while... Found volume group "vg_hpml370g401" using metadata type lvm2 Activating logical volumes 2 logical volume(s) in volume group "vg_hpml370g401" now active Free memory/Total memory (free %): 94468 / 118644 ( 79.6231 ) mapping br0 to br0 mapping eth0 to eth0 br0 Link Up. Waiting 60 Seconds Continuing device eth0 entered promiscuous mode ADDRCONF(NETDEV_UP): eth0: link is not ready udhcpc (v1.15.1) started Sending discover... Sending discover... Sending discover... No lease, failing br0 failed to comd: stopping all md devices. me up Restarting system. machine restart ------------------------------------------------------------------------------ Success dump output: Free memory/Total memory (free %): 94468 / 118644 ( 79.6231 ) mapping br0 to br0 mapping eth0 to eth0 br0 Link Up. Waiting 60 Seconds Continuing device eth0 entered promiscuous mode ADDRCONF(NETDEV_UP): eth0: link is not ready udhcpc (v1.15.1) started Sending discover... tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready br0: port 1(eth0) entering learning state Sending discover... br0: port 1(eth0) entering forwarding state Sending discover... Sending select for 10.16.65.55... Lease of 10.16.65.55 obtained, lease time 86400 deleting routers adding dns 10.16.36.29 adding dns 10.16.255.2 adding dns 10.16.255.3 Saving to remote location nest.test.redhat.com:/mnt/qa Free memory/Total memory (free %): 93840 / 118644 ( 79.0938 ) Copying data : [100 %] Saving core complete md: stopping all md devices. Restarting system. =============================================================================== I also tested on ibm-x3655-05.ovirt.rhts.eng.bos.redhat.com with RHEL6.0-20100811.2_nfs-Server-x86_64 and ibm-js22-03.rhts.eng.bos.redhat.com with RHEL6.0-20100805.0_nfs-Server-ppc64. They are all works.Maybe it's just the terrible, maybe hardware related. (In reply to comment #21) > mapping br0 to br0 > mapping eth0 to eth0 > br0 Link Up. Waiting 60 Seconds > Continuing > device eth0 entered promiscuous mode > ADDRCONF(NETDEV_UP): eth0: link is not ready Seems like a tg3 driver issue, link is still not ready after 'ifconfig eth0 up' and sleeping for 60 seconds. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |
Created attachment 422599 [details] serial console log during kdump using scp Description of problem: kdump to a network target (scp or nfs) fails over a bridge device. The messages below are output when restarting the kdump service: # service kdump restart Stopping kdump: [ OK ] Detected /etc/kdump.conf or /boot/vmlinuz-2.6.32-28.el6.x86_64 change Rebuilding /boot/initrd-2.6.32-28.el6.x86_64kdump.img ls: cannot access /sys/class/net/br0/device: No such file or directory Starting kdump: [ OK ] Version-Release number of selected component (if applicable): - kernel-2.6.32-28.el6 - kexec-tools-2.0.0-69.el6 How reproducible: always Steps to Reproduce: 1. create bridge device 2. configure kdump.conf to use scp or nfs 3. restart kdump service 4. panic the server Actual results: vmcore is not captured on remote host Expected results: vmcore captured on remote host Additional info: - problem occurs in both i386 and x86_64 - configuring kdump.conf to dump to local host succeeds