Bug 459879
Summary: | kdump via bond device doesn't work for non-basic config. | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Etsuji Nakai <enakai0> | |
Component: | kexec-tools | Assignee: | Neil Horman <nhorman> | |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 5.2 | CC: | mgahagan, qcai, syeghiay | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 600607 (view as bug list) | Environment: | ||
Last Closed: | 2009-01-20 20:59:52 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Attachments: |
What exactly do you think is going wrong? bond1 should be passed into find_activate_slaves as arg 1, and as a result we should find all physical interfaces for which we are attached to that bond. Your patch seems like a hack to make your environment work properly (especially since you hardcode bond0 into your definition of OLD_MASTER. If you would please send me your kdump.conf, and the initramfs image you produce from it. Thanks! Created attachment 314951 [details]
kdump.conf
Created attachment 314953 [details]
inird for the crash kernel
Created attachment 314954 [details]
Correct one: initrd for the crash kernel
Attached 314953 is incorrect. Please ignore it. This is the correct one.
Please re-check how the "init" script in the initrd for the crach kernel (initrd-<Version>kdump.img) handles bonding devices. What I found was that: 1. bondX (a bond master in the normal kernel. bond1 in my case) is forcefully converted to bond0 (that is hardcoded) with the "map_interface" script. * Corresponding part of "init" (in initrd-<Version>kdump.img) ------------ for i in `ls /etc/ifcfg-*` do NETDEV=`echo $i | cut -d"-" -f2` map_interface $NETDEV done rename_interfaces IFACE=`cat /etc/iface_to_activate` ifup $IFACE ------------ * "scriptfns/map_interface" called from "init". ------------ . /etc/ifcfg-$NETDEV for j in `ifconfig -a | awk '/.*Link encap.*/ {print $1}'` do case "$BUS_ID" in Bonding) REAL_DEV=bond0 RENAMED="yes" ;; ... #build the interface rename map echo $NETDEV $REAL_DEV tmp$TMPCNT >> /etc/iface_map TMPCNT=`echo $TMPCNT 1 + p | dc` echo $TMPCNT > /tmp/tmpcnt echo mapping $NETDEV to $REAL_DEV ------------- 2. As a result, bond0 (instead of the original bond master, bond1 in my case) is always activated as a master, and "bond0" is passed into find_activate_slaves as arg 1. 3. However, since "ifcfg-eth1, ifcfg-eth2" still contains "MASTER=bond1", it fails to be enslaved to the active bond0. Note that, another problem is that find_activate_slaves tries to read ifcfg-eth0 first and failes (as ifcfg-eth0 doesn't exist in the crash kernel environment.) Then, find_activate_slaves is aborted there. This is why I added + if [ -f /etc/ifcfg-\$j ]; in the patch. See the attached (314951, 314954) for kdump.conf and initrd (for crash kernel) which was built with the original (non-patched) mkdumprd. The concerns you have in note 2 is supposed to be taken care of in the rename_interfaces function. Although looking at it, it seems like that function misses handling slaves on bonded interfaces. I'll write a patch... Created attachment 315787 [details]
patch to update MASTER pointers for slaves on bonded interfaces
I've not tested it yet, but I think this will update what your missing. Please let me know if it solves your problems. Thanks!
Created attachment 315851 [details]
fixing typo and the second problem for /sbin/mkdumprd
Thanks Neil. I found two typos in your patch: + IS_BOND=\`echo /etc/ifcfg-\$NEW | grep bond\` + if [ -n "$IS_BOND" ] #### shoud be =====> if [ -n "\$IS_BIND" ] + then + for i in \`ls /etc/ifcfg-*\` + do + sed -e"s/.*MASTER=\$CURRENT.*/MASTER=\$NEW/" \$i > /tmp/ifcfg-tmp + mv /sbin/ifcfg-tmp \$i #### shold be =====> mv /tmp/ifcfg-tmp \$i + done + fi And there still remains the second problem: find_activate_slaves finds device names ethXX from 'ifconfig' output and tries to read the corresponding /etc/if-ethXX, but it doesn't always exist the corresponding one. Hence, when find_activate_slaves tries to read non-existing /etc/ifcfg-ethX, if failes and find_activate_slaves is aborted there. (This may be a particular behaviour of the busybox shell.) See the attached (id=315851) for the patch fixing the typos and this problem. It worked in my environment. Like I said, I hadn't tested it. What is the result of your testing after fixing the typos? As for your patch, As I've noted, I still don't like it for the reasons I gave previously. However, given that you seem adamant on it, I'll consider it if you can test it and show that it works in other cases as well. Speciically if it works in the trivial case (where there is only one bonded interface bond0 in the entire system runing under a normal kernel), and in the case where there is only one bonded interface without the normal name (say bondtest). Then I'll look into taking it. No, no. I'm not sticking to the original patch (id=314866), let's throw it away. Please look at the contents of my last patch (id=315851). https://bugzilla.redhat.com/attachment.cgi?id=315851 It's just a slight modification of your patch (id=315787) in the follwing ways. - Fixing typos. - Add a fix to the another problem below (other than the one fixed by your patch.) ------------------ find_activate_slaves finds device names ethXX from 'ifconfig' output and tries to read the corresponding /etc/if-ethXX, but it doesn't always exist the corresponding one. Hence, when find_activate_slaves tries to read non-existing /etc/ifcfg-ethX, if failes and find_activate_slaves is aborted there. (This may be a particular behaviour of the busybox shell.) ------------------ And my testing result is that. Your patch (id=315787) didn't work becase of the typos and the another problem. The modified version of your patch (id=315851) worked well. Does it make sense? Thanks. Ahh, sorry, I missed that attachment. Yes, what you have there makes sense to me, I'll check that it as soon as I can. Thanks! An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0105.html |
Created attachment 314866 [details] suggested patch for /sbin/mkdumprd Description of problem: kdump via bond0 consisting of eth0 and eth1 may work well, but it doesn't work other (non-basic) bond configs such as "bond1 consisting of eth1 and eth2" Version-Release number of selected component (if applicable): kexec-tools-1.102pre-21.el5 kexec-tools-1.102pre-21.el5_2.2 How reproducible: Follow the steps below. Steps to Reproduce: 1. Create bond1 with eth1 and eth2 in addition to (non-bond) eth0. # cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=dhcp ONBOOT=yes DHCP_HOSTNAME=rhel52 # cat /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 ONBOOT=yes SLAVE=yes MASTER=bond1 # cat /etc/sysconfig/network-scripts/ifcfg-eth2 DEVICE=eth2 ONBOOT=yes SLAVE=yes MASTER=bond1 # cat /etc/sysconfig/network-scripts/ifcfg-bond1 DEVICE=bond1 IPADDR=192.168.1.198 NETMASK=255.255.255.0 NETWORK=192.168.1.0 BROADCAST=192.168.1.255 GATEWAY=192.168.1.254 ONBOOT=yes BOOTPROTO=static BONDING_OPTS="mode=1 primary=eth1 miimon=100 updelay=5000" Add "alias bond1 bonding" to /etc/modprobe.conf 2. Configrue net-kdump via bond1 # cat /etc/kdump.conf net admin.1.190 path /home/admin/crash 3. Start kdump # echo c > /proc/sysrq-trigger Actual results: eth1 and eth2 fails to be enslaved, and vmcore cannot be sent out to the remote server. Expected results: vmcore is sent out to the remote server via the bond device. Additional info: Attached is a suggested patch for /sbin/mkdumprd included in kexec-tools-1.102pre-21.el5_2.2