Bug 1384683 - [LLNL 7.4 Bug] kdump initrd includes IP address of DHCP booted server
Summary: [LLNL 7.4 Bug] kdump initrd includes IP address of DHCP booted server
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kexec-tools
Version: 7.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 7.4
Assignee: Baoquan He
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1381646 1384121 1394638 1473055
TreeView+ depends on / blocked
 
Reported: 2016-10-13 20:46 UTC by Ben Woodard
Modified: 2017-07-25 04:43 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-25 03:38:47 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Ben Woodard 2016-10-13 20:46:37 UTC
Description of problem:
Our diskless machines PXE boot and must share a NFS root file system. We can trick the initscript for kdump to accept a particular initramfs as current for a particular kernel. However, we need to generate this kdump initramfs on one machine and then put it in the diskless image to be used all the compute nodes within a cluster. 

Unfortunately, when we do this the initramfs generator used by kdump seems to ignore the fact that IP address was acquired via DHCP and instead of storing the fact the IP address for the kdump kernel should be acquired by DHCP it encodes the IP address at the time when the machine was booted up in the initramfs. 

This causes a couple of problems:
1) In our environment it means that we have to unpack and edit our initramfs for kdump manually so that it uses DHCP so that it doesn't unintentionally create IP address conflicts when it boots the kdump kernel.

This isn't just a niche problem for our unusual HPC environment. 

2) Encoding the IP address in the initramfs creates other problems in normal situations. If a machine boots up gets a DHCLP address and creates a kdump initramfs then if the next time it boots up with the same kernel it is given a different IP address and the original IP address is given to a different machine, then the kdump kernel will try to use its old IP address which is being used by another machine rather than its current one. There is no check in the code that detects if the IP address has changed thus forcing a rebuild of the kdump initramfs. Until the kernel changes that initramfs will stay around with the old IP address in it.

3) There are times when network admins re-IP a subnet and a server's IP address will change. Since a server may be up for a very long period of time, there is no guarantee that the IP address encoded in the kdump initramfs is still valid when it crashes.

Version-Release number of selected component (if applicable):
2.0.7-50 (the version from 7.3 beta)

The expected result is to have the kdump kernel pick the correct ethernet interface but  use dhcp to configure it rather than using a potentially out of date IP address.

Comment 3 Baoquan He 2016-11-16 07:30:04 UTC
Hi Ben,

I am not a native speaker, could you speak of below things more understandable? I am trying to debug, and now code truly put proto=dhcp into initramfs, but I didn't get what is "instead of storing the fact the IP address for the kdump kernel should be ......." meaning?


Unfortunately, when we do this the initramfs generator used by kdump seems to ignore the fact that IP address was acquired via DHCP and instead of storing the fact the IP address for the kdump kernel should be acquired by DHCP it encodes the IP address at the time when the machine was booted up in the initramfs. 

Thanks
Baoquan

Comment 4 Baoquan He 2016-11-17 02:11:15 UTC
Besides could you change /lib/dracut/modules.d/99kdumpbase/module-setup.sh as below? I doubt there's something wrong inside kdump_static_ip(). Then the shell debugging print may tell what's going on.


--- module-setup.sh     2016-11-16 20:58:41.697426841 -0500
+++ module-setup.sh.orig        2016-11-16 20:58:00.156210910 -0500
@@ -324,7 +324,6 @@ get_ip_route_field()
 #$1: config values of net line in kdump.conf
 #$2: srcaddr of network device
 kdump_install_net() {
-set -x
     local _server _netdev _srcaddr _route _serv_tmp
     local config_val="$1"

@@ -359,7 +358,6 @@ set -x
         echo "kdumpnic=$(kdump_setup_ifname $_netdev)" > ${initdir}/etc/cmdline.d/60kdumpnic.conf
         echo "bootdev=$(kdump_setup_ifname $_netdev)" > ${initdir}/etc/cmdline.d/70bootdev.conf
     fi
-set +x
 }

Comment 5 Baoquan He 2016-11-18 02:09:45 UTC
In kdump shell, we use below code to check if an ip addr is permanent, namely a static configured IP. Otherwise it's a dynamically created IP, namely from dhcp.

_ipaddr=$(ip addr show dev $_netdev permanent | awk "/ $_srcaddr\/.* /{print \$2}")

However, when I checked my laptop, the virtual interface like virtual bridge for kvm and tunnel for vpn, they also behaved as statically configured, I mean permanent. In fact I didn't add their configuration information into /etc/sysconfig/network-scripts/ifcfg-xxxx to make them static. So I am wondering how it's working on your system, finally it makes dhcp created IP be identified as static configured. Testing it as comment 4 can tell if there's something wroong.

[bhe@x1 ~]$ ip a show virbr0 permanent
4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:c4:53:7e brd ff:ff:ff:ff:ff:ff
    inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
       valid_lft forever preferred_lft forever

[bhe@x1 ~]$ ip a show tun0 permanent
22: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1412 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none 
    inet 10.72.6.154/32 scope global tun0
       valid_lft forever preferred_lft forever
    inet6 fe80::9403:d586:a7cf:50f9/64 scope link flags 800 
       valid_lft forever preferred_lft forever

Comment 6 Dave Young 2016-11-18 02:40:26 UTC
If one specify an ip address as dump target it will definitely fail in case dhcp server changes. If on use hostname in kdump.conf then it should be a kdump scripts bug. Please double check and provide more infomatioin for debugging.

* kdump.conf
* kdump kernel console log
* how to reproduce it if possible

Thanks
Dave

Comment 9 Baoquan He 2016-11-28 10:24:27 UTC
It can't be reproduced on my local kvm guests. Just did as Comment 8 said.

There could be other deployment issues causing problem they saw. Since no further response and additional information provided, I would like to close this bug as INSUFFICIENT_DATA. That's it.

Below is cmdline content from decompressed kdump initramfs.

[root@localhost cmdline.d]# ll
total 16
-rw-r--r--. 1 root root 45 Aug 25 23:57 40ip.conf
-rw-r--r--. 1 root root 25 Aug 25 23:57 42dns.conf
-rw-r--r--. 1 root root  0 Aug 25 23:57 45route-static.conf
-rw-r--r--. 1 root root 14 Aug 25 23:57 60kdumpnic.conf
-rw-r--r--. 1 root root 13 Aug 25 23:57 70bootdev.conf
[root@localhost cmdline.d]# cat 40ip.conf 
 ip=ens3:dhcp
 ifname=ens3:52:54:00:0a:9a:7d
[root@localhost cmdline.d]# cat 42dns.conf 
nameserver=192.168.124.1
[root@localhost cmdline.d]# cat 45route-static.conf 
[root@localhost cmdline.d]# cat 60kdumpnic.conf 
kdumpnic=ens3
[root@localhost cmdline.d]# cat 70bootdev.conf 
bootdev=ens3
[root@localhost cmdline.d]#

Comment 10 Baoquan He 2016-11-28 10:25:00 UTC
(In reply to Baoquan He from comment #9)
> It can't be reproduced on my local kvm guests. Just did as Comment 8 said.
                                                                     ~7, I meant
> 
> There could be other deployment issues causing problem they saw. Since no
> further response and additional information provided, I would like to close
> this bug as INSUFFICIENT_DATA. That's it.
> 
> Below is cmdline content from decompressed kdump initramfs.
> 
> [root@localhost cmdline.d]# ll
> total 16
> -rw-r--r--. 1 root root 45 Aug 25 23:57 40ip.conf
> -rw-r--r--. 1 root root 25 Aug 25 23:57 42dns.conf
> -rw-r--r--. 1 root root  0 Aug 25 23:57 45route-static.conf
> -rw-r--r--. 1 root root 14 Aug 25 23:57 60kdumpnic.conf
> -rw-r--r--. 1 root root 13 Aug 25 23:57 70bootdev.conf
> [root@localhost cmdline.d]# cat 40ip.conf 
>  ip=ens3:dhcp
>  ifname=ens3:52:54:00:0a:9a:7d
> [root@localhost cmdline.d]# cat 42dns.conf 
> nameserver=192.168.124.1
> [root@localhost cmdline.d]# cat 45route-static.conf 
> [root@localhost cmdline.d]# cat 60kdumpnic.conf 
> kdumpnic=ens3
> [root@localhost cmdline.d]# cat 70bootdev.conf 
> bootdev=ens3
> [root@localhost cmdline.d]#

Comment 11 Dave Young 2016-11-28 12:06:58 UTC
> [root@localhost cmdline.d]# cat 40ip.conf 
>  ip=ens3:dhcp
>  ifname=ens3:52:54:00:0a:9a:7d

The mac addresses are same for two guests? If no it should fail..

Comment 12 Baoquan He 2016-11-28 13:09:37 UTC
Interesting, mac is different, didn't fail.

Dropped into kdump shell, interface is up. udev rules expect a interface with mac in another kvm guest.

kdump:/etc/cmdline.d# cat /etc/udev/rules.d/80-ifname.rules 
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:0a:9a:7d", ATTR{type}=="1", NAME="ens3"
kdump:/etc/cmdline.d# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:83:47:b4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.124.216/24 brd 192.168.124.255 scope global dynamic ens3
       valid_lft 3229sec preferred_lft 3229sec
    inet6 fe80::5054:ff:fe83:47b4/64 scope link 
       valid_lft forever preferred_lft forever

Comment 13 Ben Woodard 2017-01-10 21:37:49 UTC
This results in a problem with all the dumps from all the clustered machines being named as if they come from 127.0.0.1 this of course makes it difficult to know from what machine the dump came from.

The SA's from the lab came up with a work around to the problem. I understand that this won't help you reproduce the reported problem (why you can't reproduce it mystifies me and I think that you should quit trying to use virtual machines and actually really netboot a real machine and then it will be obvious) but it may help resolve the larger issue.

diff -uNr kexec-tools_a/dracut-kdump.sh kexec-tools_b/dracut-kdump.sh
--- kexec-tools_a/dracut-kdump.sh       2016-11-23 10:30:14.162146000 -0800
+++ kexec-tools_b/dracut-kdump.sh       2016-11-23 12:58:58.116060000 -0800
@@ -114,6 +114,14 @@
     fi
 }
 
+get_host_name()
+{
+    _hostname=`cat /proc/cmdline | grep hostname | sed -r 's/.*hostname=(\w+).*/\1/'`
+    [ -z "$_hostname" ] && return 1
+    HOST_IP=$_hostname
+    return 0
+}
+
 get_host_ip()
 {
     local _host
@@ -173,10 +181,14 @@
 read_kdump_conf
 fence_kdump_notify
 
-get_host_ip
+# check to see if hostname is provided on cmdline first.  If not try to get host ip.
+get_host_name
 if [ $? -ne 0 ]; then
-    echo "kdump: get_host_ip exited with non-zero status!"
-    exit 1
+    get_host_ip
+    if [ $? -ne 0 ]; then
+        echo "kdump: get_host_ip exited with non-zero status!"
+        exit 1
+    fi
 fi
 
 if [ -z "$DUMP_INSTRUCTION" ]; then
diff -uNr kexec-tools_a/kdumpctl kexec-tools_b/kdumpctl
--- kexec-tools_a/kdumpctl      2016-11-23 10:30:14.196149000 -0800
+++ kexec-tools_b/kdumpctl      2016-11-23 11:06:24.771011000 -0800
@@ -128,6 +128,11 @@
                cmdline=`append_cmdline "${cmdline}" disable_cpu_apicid ${id}`
        fi
 
+        # Append hostname to cmdline so it can be used when writing out nfs/ssh vmcores
+       _host_name=`/usr/bin/hostname`
+
+       cmdline=`append_cmdline "${cmdline}" hostname ${_host_name}`
+
        echo $cmdline
 }

Comment 14 Baoquan He 2017-01-11 07:46:43 UTC
(In reply to Ben Woodard from comment #13)
> This results in a problem with all the dumps from all the clustered machines
> being named as if they come from 127.0.0.1 this of course makes it difficult
> to know from what machine the dump came from.

Good to know this, thanks for your update, Ben.

I haven't got the root cause, if possible could you try adding debug code as Comment 4 suggested. It could let me know what's done on your clustered system with network dumping config. Since you have got a work around, I will try a local system later to see if it can reproduce, as you said, not virtual machine. Just working on another issue, will update if any progress. And you can also paste the debug printing if you can try Comment 4.

Thanks
Baoquan

Comment 15 Baoquan He 2017-07-21 08:11:51 UTC
Hi,

We need investigate and check if we can fix it in rhel7.5, so we still don't get what happened exactly. 

Do you have any suggestion about the further fix? We might not be able to build a test environment to reproduce it. It will be appreciated if you can propose a fix patch with root cause, or further investigation, we can help review it.

Or you can add debug code in shell script as Comment 4 suggested, I can try to understand what happened.

Thanks
Baoquan

Comment 16 Baoquan He 2017-07-24 05:28:30 UTC
(In reply to Ben Woodard from comment #13)
> This results in a problem with all the dumps from all the clustered machines
> being named as if they come from 127.0.0.1 this of course makes it difficult
> to know from what machine the dump came from.

If you see a "127.0.0.1" that means it's a file system dumping, not network dumping. Even though the file system might be a network fs. So far, question is still not clear, root cause is not found. If still no reply, I will close this bug as NOTABUG because lack farther information.

Thanks
Baoquan

Comment 17 Ben Woodard 2017-07-24 19:52:25 UTC
Having it use 127.0.0.1 for all FS dumping is the current problem. It writes to a network file system and that makes figuring out which dump is from which machine impossible. Comment #13 shows the patch that we are currently carrying to work around this issue.

Please either apply that patch or do something that achieves the same result.

Comment 18 Baoquan He 2017-07-25 00:22:26 UTC
(In reply to Ben Woodard from comment #17)
> Having it use 127.0.0.1 for all FS dumping is the current problem. It writes
> to a network file system and that makes figuring out which dump is from
> which machine impossible. Comment #13 shows the patch that we are currently
> carrying to work around this issue.

Yes, it might be a problem. Except of NFS and ssh, we don't know if those local fs system actually is going through a network transferring. Up to now, no one complained about this, including those FCOE and iSCSI. Only you raised a issue. The thing is that we at least let kdump know what fs systems are network fs system and need be exceptional to use the hostname.

> 
> Please either apply that patch or do something that achieves the same result.

Without knowledge about the problem and what we need do to fix, I can't apply the workaround fix. It's obviously not right and will stirrup the other correct fs system setting. So the point is clear, that we need to know what's the scenario and what's the problem, and what's the root cause, and next what we should do exactly.

Thanks
Baoquan

Comment 19 Ben Woodard 2017-07-25 00:48:38 UTC
kdump was working fine in RHEL6 we had no problems with how it worked. We have been battling with how it works in RHEL7 since we first started playing with the RHEL7 beta.
Several other issues have been dealt with already.

Here there are two problems:
1) we generate the kdump initrd on one machine which has a disk and writable file systems and then we use it on all the compute nodes which do not have any disk and do not have writable system file systems. For a number of reasons there cannot be writable system filesystems in these machines. If when we generate the kdump-initrd it includes the interface MAC address for the interface in the machine on which we generate the kdump-initrd or the IP address for that interface in the kdump-initrd, then this kdump-initrd will not work on the compute nodes for which it is intended. We have had to patch around this by manually repacking the kdump-initrd after it was generated.

2) when a node crashes and the kdump kernel writes to one of our NFS filers, all the dumps appear with the address 127.0.0.1 rather than the actual machine's hostname or the IP address. (the kdump kernel gets its IP address from the DHCP server.) This creates a problem because we can't tell which dump came from which machine. Therefore, we have had to patch the code within the kdump script init the kdump-initrd to make it use IP of the machine which crashed rather than 127.0.0.1. The patch that we use is in comment 13.

Comment 20 Baoquan He 2017-07-25 01:15:54 UTC
Thanks, Ben. Then it's clear to us what's happened.

But I have to say that it might be a correct way to use kdump in your side. In kdump glue shell script, we always suppose it's run on the system which safety is worried about and we want to deploy kdump to try to collect the dumping vmcore when crash is triggered on. With this worry and deployment, we need the system information of the system itself, not from others. Imagine that you have another machine with more than 100 CPUs, TB of memory, many PCI/PCIe devices, kinds of iSCSI/FCOE storage disks and locate root fs on one of them, powerful ethernet network interface cards, now you want to dump to remote ssh server through NIC by default, and dump to rootfs if failed to dump by ssh. DOES it work with your copied initrd?

In fact, I said much in above just for clarifying that kdump initrd need be created on their own system where you worry about its safety. You don't tell the network info to kdump, how could you expect it to get correct into and tell you? As for the scenario, I need discuss with team and see if there's a method to make it.

Thanks
Baoquan

Comment 21 Baoquan He 2017-07-25 01:16:37 UTC
(In reply to Baoquan He from comment #20)
> Thanks, Ben. Then it's clear to us what's happened.
> 
> But I have to say that it might be a correct way to use kdump in your side.
                                    ~ NOT

> In kdump glue shell script, we always suppose it's run on the system which

Comment 22 Ben Woodard 2017-07-25 02:14:27 UTC
(In reply to Baoquan He from comment #20)
> Thanks, Ben. Then it's clear to us what's happened.
> 
> But I have to say that it might not be a correct way to use kdump in your side.

We have been through this at least two times before in other bugs.

> In kdump glue shell script, we always suppose it's run on the system which
> safety is worried about and we want to deploy kdump to try to collect the
> dumping vmcore when crash is triggered on. 

I know that you really really want that to be true and it would make things easier for you if you could alway make that assumption but after long discussions bringing in many other parties, we finally won the argument over that.

I'm going to make state it as plainly as possible:
THAT IS _NOT_ A VALID ASSUMPTION FOR AN ENTERPRISE DISTRIBUTION. End of story. Go back and talk to PM about this again if you must. That is a done and decided position. Yes, I understand that the people who initially architected the kdump scripts for RHEL7 failed to take that into account when they designed kdump. I'm sorry that technical requirement wasn't explicitly clear when it was first implemented, I'm sorry that you have to deal with the mess. No one expected the kdump functionality to regress so badly between rhel6 and rhel7.

> With this worry and deployment,
> we need the system information of the system itself, not from others.
> Imagine that you have another machine with more than 100 CPUs, TB of memory,
> many PCI/PCIe devices, kinds of iSCSI/FCOE storage disks and locate root fs
> on one of them, powerful ethernet network interface cards, now you want to
> dump to remote ssh server through NIC by default, and dump to rootfs if
> failed to dump by ssh. DOES it work with your copied initrd?
> 

You or someone else working on kdump made the same argument last time. The argument is not convincing. You can always dream up hypothetical situations where it won't constructing a stawman argument. When we last went through this argument two of the most convincing points were: 
1) RHEL5 and RHEL6 had no problem with this.
2) The creation of the kdump-initrd is not for the ultra-general case where the kind of variation that you are putting forth in your strawman argument might crop up. These kdump-initrd's are built to be deployed on a homogeneous compute cluster and as such the hardware and software configuration of the machines on which the kdump-initrd is generated matches that of the compute node as closely as possible. The exception being that the machine generating the kdump-initrd has a writable /boot


> In fact, I said much in above just for clarifying that kdump initrd need be
> created on their own system where you worry about its safety. You don't tell
> the network info to kdump, how could you expect it to get correct into and
> tell you? As for the scenario, I need discuss with team and see if there's a
> method to make it.
> 
> Thanks
> Baoquan

Comment 23 Baoquan He 2017-07-25 02:26:40 UTC
One way I can think of is you need find one of those diskless systems and attach a disk to create a kdump initrd with network dump setting, then copy it to other diskless systems. And dhcp is necessary, otherwise they will get fixed IP addr.

Comment 24 Baoquan He 2017-07-25 03:36:36 UTC
I didn't notice comment 22 and made comment 23. So this is not a bug according to comment 19 and 20. And I am in modd to talk about the architecture of kdump now because there are many other issues to do now. If a re-architecturing kdump issue is needed, please open a new bug. No further comment about this bug. We fix bug, any bug, no matter what it is, kernel bug or user space bug, we try to find out the root cause, and try to work with people to find a better way to fix or work around. Not a yelling or childish crying.

Comment 25 Baoquan He 2017-07-25 03:38:47 UTC
It's not a bug.

Comment 26 Dave Young 2017-07-25 04:43:00 UTC
(In reply to Ben Woodard from comment #22)
> (In reply to Baoquan He from comment #20)
> > Thanks, Ben. Then it's clear to us what's happened.
> > 
> > But I have to say that it might not be a correct way to use kdump in your side.
> 
> We have been through this at least two times before in other bugs.
> 
> > In kdump glue shell script, we always suppose it's run on the system which
> > safety is worried about and we want to deploy kdump to try to collect the
> > dumping vmcore when crash is triggered on. 
> 
> I know that you really really want that to be true and it would make things
> easier for you if you could alway make that assumption but after long
> discussions bringing in many other parties, we finally won the argument over
> that.
> 
> I'm going to make state it as plainly as possible:
> THAT IS _NOT_ A VALID ASSUMPTION FOR AN ENTERPRISE DISTRIBUTION. End of
> story. Go back and talk to PM about this again if you must. That is a done
> and decided position. Yes, I understand that the people who initially
> architected the kdump scripts for RHEL7 failed to take that into account
> when they designed kdump. I'm sorry that technical requirement wasn't
> explicitly clear when it was first implemented, I'm sorry that you have to
> deal with the mess. No one expected the kdump functionality to regress so
> badly between rhel6 and rhel7.

It may happen to work in RHEL6, but that does not means it is supported, if QE do not test it.

Question is, arguing about this do not help to solve the problem, we need *understand* and make *clear* what is the problem and if we can help. If it is a new feature then we should re-evaluate if we can do it, filing a new feature bug makes sense.


Note You need to log in before you can comment on or make changes to this bug.