Bug 1746529

Summary: kdump on vlan device fails in RHEL7.6
Product: Red Hat Enterprise Linux 7 Reporter: Curtis Taylor <cutaylor>
Component: kexec-toolsAssignee: Bhupesh Sharma <bhsharma>
Status: CLOSED ERRATA QA Contact: xiaoying yan <yiyan>
Severity: high Docs Contact:
Priority: high    
Version: 7.7CC: bhsharma, cye, kdump-bugs, ruyang
Target Milestone: rc   
Target Release: 7.9   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kexec-tools-2.0.15-44.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-29 19:37:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1771898    

Description Curtis Taylor 2019-08-28 17:00:14 UTC
Description of problem:
Kdump fails with network unreachable mkdumprd produces vlan config that does not get ip address for vlan on eth device. 

Version-Release number of selected component (if applicable):
kexec-tools-2.0.15-33.el7

How reproducible:
Consistently reproducible with kvm VM with eth0 and eth0.some-vlan device.  Setup kdump.conf for ssh using eth0.vlan device. kdump fails.

Steps to Reproduce:
1.  In vm with single eth0 device and eth0.160:
https://src.fedoraproject.org/rpms/kexec-tools.git
VLAN="yes"
TYPE="Vlan"
PHYSDEV="eth0"
VLAN_ID="160"
REORDER_HDR="no"
GVRP="no"
VLAN_FLAGS="NO_REORDER_HDR"
MVRP="no"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="none"
IPADDR="192.168.132.2"
PREFIX="24"
GATEWAY="192.168.132.1"
DNS1="192.168.132.1"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="eth0.160"
DEVICE="eth0.160"
ONBOOT="yes"

2.  Configure kdump over the eth0.160 device:
ssh kdump.132.1
sshkey /root/.ssh/id_rsa
path /var/crash/
core_collector makedumpfile -F -d 17 -c
default shell

3. Build initrd:
systemctl start kdump

Actual results:
kdump test fails.
kdump shell shows:
# ip addr show eth1
2: kdump-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:27:c4:a5 brd ff:ff:ff:ff:ff:ff
3: eth0.160@kdump-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:27:c4:a5 brd ff:ff:ff:ff:ff:ff


Expected results:
kdump test succeeds.
kdump eth0.160 should get ip address something like:
2: kdump-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:27:c4:a5 brd ff:ff:ff:ff:ff:ff
3: eth0.160@kdump-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:27:c4:a5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.132.2/24 brd 192.168.132.255 scope global eth1.160
       valid_lft forever preferred_lft forever

Additional info:
Dissection of kdump initrd reveals /etc/cmdline.d/ contains these two files and contents:

40ip.conf 
  ip=192.168.132.2::192.168.132.1:255.255.255.0::kdump-eth0.160:none
43vlan.conf: 
  vlan=eth0.160:kdump-eth0 ifname=kdump-eth0:52:54:00:27:c4:a5

Which can be confirmed, in kdump shell, to build kdump-eth0 device, and eth0.160 device.  eth0.160 does not get an ip address, because the ip address expects device kdump-eth0.160.

The following solution has been tested by the customer and does allow this configuration to work on eth0.160:

diff -up /usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh.orig.2 /usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh
--- /usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh.orig.2	2019-08-27 15:33:28.397000000 -0400
+++ /usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh	2019-08-27 15:34:34.571000000 -0400
@@ -252,10 +252,10 @@ kdump_setup_vlan() {
         exit 1
     elif kdump_is_bond "$_phydev"; then
         kdump_setup_bond "$_phydev"
-        echo " vlan=$_netdev:$_phydev" > ${initdir}/etc/cmdline.d/43vlan.conf
+        echo " vlan=$(kdump_setup_ifname $_netdev):$_phydev" > ${initdir}/etc/cmdline.d/43vlan.conf
     else
         _kdumpdev="$(kdump_setup_ifname $_phydev)"
-        echo " vlan=$_netdev:$_kdumpdev ifname=$_kdumpdev:$_netmac" > ${initdir}/etc/cmdline.d/43vlan.conf
+        echo " vlan=$(kdump_setup_ifname $_netdev):$_kdumpdev ifname=$_kdumpdev:$_netmac" > ${initdir}/etc/cmdline.d/43vlan.conf
     fi
 }

That patch results in kdump-eth0.160 instead of eth0.160, which allows no changes necessary to 40ip.conf.

Comment 3 Bhupesh Sharma 2020-01-16 11:11:32 UTC
1. Thanks, I was able to reproduce the issue on a vlan setup (server-client). Thanks to Emma (QE) for helping me with the setup as I was not able to re-create an environment for the some locally/on-beaker.

2. I made the suggested change to kexec-tools package as suggested in Comment 0:

diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh
index c833bfe507e2..91c50f4976ef 100755
--- a/dracut-module-setup.sh
+++ b/dracut-module-setup.sh
@@ -252,10 +252,10 @@ kdump_setup_vlan() {
         exit 1
     elif kdump_is_bond "$_phydev"; then
         kdump_setup_bond "$_phydev"
-        echo " vlan=$_netdev:$_phydev" > ${initdir}/etc/cmdline.d/43vlan.conf
+       echo " vlan=$(kdump_setup_ifname $_netdev):$_phydev" > ${initdir}/etc/cmdline.d/43vlan.conf
     else
         _kdumpdev="$(kdump_setup_ifname $_phydev)"
-        echo " vlan=$_netdev:$_kdumpdev ifname=$_kdumpdev:$_netmac" > ${initdir}/etc/cmdline.d/43vlan.conf
+       echo " vlan=$(kdump_setup_ifname $_netdev):$_kdumpdev ifname=$_kdumpdev:$_netmac" > ${initdir}/etc/cmdline.d/43vlan.conf
     fi
 }

3. And thereafter the issue appears fixed and I was able to save the vmcore via kdump properly (while earlier it faile). Here are some logs in the passing case:

server:
======
[root@hp-dl380pg8-06 ~]# uname -rn
hp-dl380pg8-06.rhts.eng.pek2.redhat.com 3.10.0-1062.el7.x86_64

[root@hp-dl380pg8-06 ~]# rpm -qa kexec-tools
kexec-tools-2.0.15-33.el7.x86_64

[root@hp-dl380pg8-06 ~]# ip addr

<.. snip..>

10: ens5f4.10@ens5f4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:07:43:2b:2f:60 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.145/24 brd 192.168.1.255 scope global noprefixroute ens5f4.10
       valid_lft forever preferred_lft forever
    inet6 fe80::207:43ff:fe2b:2f60/64 scope link 
       valid_lft forever preferred_lft forever

client:
========

[root@hp-dl380pg8-03 ~]# uname -rn
hp-dl380pg8-03.rhts.eng.pek2.redhat.com 3.10.0-1062.el7.x86_64

[root@hp-dl380pg8-03 ~]# ip addr show

<..ping..>

10: eth0.10@ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 68:05:ca:39:a3:f8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.150/24 brd 192.168.1.255 scope global noprefixroute eth0.10
       valid_lft forever preferred_lft forever
    inet6 fe80::6a05:caff:fe39:a3f8/64 scope link 
       valid_lft forever preferred_lft forever

[root@hp-dl380pg8-03 ~]# kdumpctl restart

[root@hp-dl380pg8-03 ~]# echo c > /proc/sysrq-trigger 

<..snip..>

         Starting Kdump Vmcore Save Service...
kdump: saving to root.1.145:/var/crash/192.168.1.150-2020-01-16-04:27:35
kdump: saving vmcore-dmesg.txt
233+1 records in
233+1 records out
119786 bytes (120 kB) copied, 0.000785862 s, 152 MB/s
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Copying data                                      : [100.0 %] |           eta: 0s
614886+2250 records in
614886+2250 records out
315193932 bytes (315 MB) copied, 5.73008 s, 55.0 MB/s
kdump: saving vmcore complete

4. I will send a kexec-tools patch soon to fix the same.

Comment 11 errata-xmlrpc 2020-09-29 19:37:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (kexec-tools bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3885