Bug 1958587

Summary: the kdump initramfs includes unnecessary NIC drivers for SSH/NFS dumping target
Product: Red Hat Enterprise Linux 8 Reporter: Coiby <coxu>
Component: kexec-toolsAssignee: Coiby <coxu>
Status: CLOSED ERRATA QA Contact: xiaoying yan <yiyan>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.4CC: cye, ruyang
Target Milestone: betaKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kexec-tools-2.0.26-6.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2120186 (view as bug list) Environment:
Last Closed: 2023-11-14 15:47:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2148318    
Bug Blocks:    

Description Coiby 2021-05-09 00:06:31 UTC
Description of problem:

When dumping vmcore to SSH/NFS target, the kdump initramfs includes NIC drivers for inactive interfaces. Take NFS target as an example, this is how the problem occurs,
1. 'mount "xx.xx.xx.xx:/path /kdumproot nfs defaults"' will be passed to dracut by mkdumprd
2. dracut will eventually run "/usr/lib/dracut/dracut-install -D /tmp/initramfs/ -H --kerneldir /lib/modules/5.10.22-200.fc33.x86_64/ -o -m =drivers/net/ethernet". "-H" means only including host drivers and "=drivers/net/ethernet" means including all drivers in drivers/net/ethernet. So the system have multiple NICs, all host drivers in drivers/net/ethernet will be copied to the initramfs.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:


1. Find a system with multiple NICs, e.g. hp-z8-g4-01.khw2.lab.eng.bos.redhat.com

[root@hp-z8-g4-01 net]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp9s0f2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether a0:8c:fd:dd:ca:be brd ff:ff:ff:ff:ff:ff
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether a0:8c:fd:dd:ca:bc brd ff:ff:ff:ff:ff:ff
    inet 10.16.210.50/23 brd 10.16.211.255 scope global dynamic noprefixroute eno1
       valid_lft 82797sec preferred_lft 82797sec
    inet6 2620:52:0:10d2:a28c:fdff:fedd:cabc/64 scope global dynamic noprefixroute 
       valid_lft 2591952sec preferred_lft 604752sec
    inet6 fe80::a28c:fdff:fedd:cabc/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: ens5: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether a0:fb:fd:dd:ca:bf brd ff:ff:ff:ff:ff:ff

[root@hp-z8-g4-01 net]# readlink /sys/class/net/{enp9s0f2,eno1,ens5}/device/driver/module
../../../../module/i40e
../../../../module/e1000e
../../../../module/e1000e


2. Configure kdump to dump to a NFS target
3. Run "kdumpctl rebuild"
[root@hp-z8-g4-01 ~]# kdumpctl rebuild
kdump: Rebuilding /boot/initramfs-4.18.0-287.el8.dt4.x86_64kdump.img

4. In the build initramfs, you will find all NIC drivers are included. For hp-z8-g4-01.khw2.lab.eng.bos.redhat.com, 

[root@hp-z8-g4-01 t]# lsinitrd --unpack /boot/initramfs-4.18.0-287.el8.dt4.x86_64kdump.img

[root@hp-z8-g4-01 t]# unsquashfs squash/root.img 
[root@hp-z8-g4-01 t]# cd squashfs-root/usr/lib/modules/4.18.0-287.el8.dt4.x86_64/kernel/drivers/net
[root@hp-z8-g4-01 net]# tree
.
├── ethernet
│   └── intel
│       ├── e1000e
│       │   └── e1000e.ko.xz
│       └── i40e
│           └── i40e.ko.xz


Actual results:

kdump initramfs includes drivers for inactive NICs.

Expected results:

The initramfs should only include drivers that are needed by kdump.

Additional info:

1. This bug is reproducible on Fedora 33, lenovo-sr630-02.rhts.eng.pek2.redhat.com.
2. It should be reproducible on REHL8.5 and RHEL9.
3. For dell-per740-55.rhts.eng.pek2.redhat.com, although there are two NICs, the other driver is in drivers/staging/qlge instead of in drivers/net/ethernet so it's not copied to the kdump initramfs.

Comment 1 Kairui Song 2021-05-10 18:41:51 UTC
Hi Coiby,

I've noticed this issue before, so I opened a PR to dracut to add a `--hostonly-nic` option:
https://github.com/dracutdevs/dracut/pull/957

It's merged, however kexec-tools part also needs a patch. Basically `mkdumprd` have know which NIC is actually needed by kdump, and pass that to dracut with this `--hostonly-nic` argument.

And, we also need to detect network changes that will change the NIC that have access to the dump target, and rebuild the initramfs if it doesn't contain the NIC driver needed.
 
Can you help try if this argument works as expected? We may put some effort into this if it helps a lot.

Comment 2 Coiby 2021-06-06 22:43:57 UTC
(In reply to Kairui Song from comment #1)
> Hi Coiby,
> 
> I've noticed this issue before, so I opened a PR to dracut to add a
> `--hostonly-nic` option:
> https://github.com/dracutdevs/dracut/pull/957
> 

Thanks for the work!

> It's merged, however kexec-tools part also needs a patch. Basically
> `mkdumprd` have know which NIC is actually needed by kdump, and pass that to
> dracut with this `--hostonly-nic` argument.
> 
> And, we also need to detect network changes that will change the NIC that
> have access to the dump target, and rebuild the initramfs if it doesn't
> contain the NIC driver needed.
>  
> Can you help try if this argument works as expected? We may put some effort
> into this if it helps a lot.

Sure, I'll finish the kexec-tools part.

Comment 3 Coiby 2021-06-06 22:46:06 UTC
Note when a driver manage multiple NICs, memory resources are still allocated for the the inactive NICS as shown in https://bugzilla.redhat.com/show_bug.cgi?id=1936277. If we could fix this situation, https://bugzilla.redhat.com/show_bug.cgi?id=1962421 could be resolved automatically.

Comment 25 errata-xmlrpc 2023-11-14 15:47:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (kexec-tools bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7080