Bug 1283191 - mkdumprd fails on RHEV hosts with running VMs
mkdumprd fails on RHEV hosts with running VMs
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kexec-tools (Show other bugs)
6.7
All Linux
urgent Severity high
: rc
: ---
Assigned To: Xunlei Pang
Virtualization Bugs
: ZStream
Depends On:
Blocks: 1172231 1305481
  Show dependency treegraph
 
Reported: 2015-11-18 07:32 EST by Julio Entrena Perez
Modified: 2016-08-31 15:47 EDT (History)
19 users (show)

See Also:
Fixed In Version: kexec-tools-2.0.0-292.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1305481 (view as bug list)
Environment:
Last Closed: 2016-05-10 15:12:09 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1571553 None None None Never
Red Hat Product Errata RHBA-2016:0734 normal SHIPPED_LIVE kexec-tools bug fix and enhancement update 2016-05-10 18:28:16 EDT

  None (edit)
Description Julio Entrena Perez 2015-11-18 07:32:47 EST
Description of problem:
mkdumprd fails on RHEV hosts with running VMs due to the 

# service kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):
  
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-573.7.1.el6.x86_64kdump.img
The ifcfg-vnet0 or ifcfg-xxx which contains DEVICE=vnet0 field doesn't exist.
Failed to run mkdumprd
Starting kdump:                                            [FAILED]

The vnet<x> network interfaces of the running guests seem to confuse mkdumprd:

# ifconfig | egrep -v ^\ \|^$\|^vlan\|^eth1\.\|^lo
eth0      Link encap:Ethernet  HWaddr 00:10:18:73:B8:8D  
rhevm     Link encap:Ethernet  HWaddr 00:10:18:73:B8:8D  
vnet0     Link encap:Ethernet  HWaddr FE:01:A4:AD:FE:CA  
vnet1     Link encap:Ethernet  HWaddr FE:01:A4:AD:FE:CB

Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-286.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Start some guests in a virt host (tested on a RHEV host)
2. Update /etc/kdump.conf
3. Restart kdump service

Actual results:
Rebuilding /boot/initrd-2.6.32-573.7.1.el6.x86_64kdump.img
The ifcfg-vnet0 or ifcfg-xxx which contains DEVICE=vnet0 field doesn't exist.
Failed to run mkdumprd
Starting kdump:                                            [FAILED]


Expected results:
Rebuilding /boot/initrd-2.6.32-573.7.1.el6.x86_64kdump.img
Starting kdump:                                            [  OK  ]


Additional info:
KCS article https://access.redhat.com/solutions/1571553 has a suggestion that helps mkdumprd to complete successfully in this scenario.
Comment 1 Minfei Huang 2015-11-20 01:16:31 EST
(In reply to Julio Entrena Perez from comment #0)
> Description of problem:
> mkdumprd fails on RHEV hosts with running VMs due to the 
> 
> # service kdump restart
> Stopping kdump:                                            [  OK  ]
> Detected change(s) the following file(s):
>   
>   /etc/kdump.conf
> Rebuilding /boot/initrd-2.6.32-573.7.1.el6.x86_64kdump.img
> The ifcfg-vnet0 or ifcfg-xxx which contains DEVICE=vnet0 field doesn't exist.
> Failed to run mkdumprd
> Starting kdump:                                            [FAILED]
> 
> The vnet<x> network interfaces of the running guests seem to confuse
> mkdumprd:
> 
> # ifconfig | egrep -v ^\ \|^$\|^vlan\|^eth1\.\|^lo
> eth0      Link encap:Ethernet  HWaddr 00:10:18:73:B8:8D  
> rhevm     Link encap:Ethernet  HWaddr 00:10:18:73:B8:8D  
> vnet0     Link encap:Ethernet  HWaddr FE:01:A4:AD:FE:CA  

Could you mind setuping ifcfg for device vnet0? Then you can try it again.

Thanks
Minfei
Comment 2 Julio Entrena Perez 2015-11-20 04:15:39 EST
(In reply to Minfei Huang from comment #1)
> 
> Could you mind setuping ifcfg for device vnet0? Then you can try it again.

The interfaces are already UP since the VMs are running:

# ifconfig | grep -A2 vnet
vnet0     Link encap:Ethernet  HWaddr FE:01:A4:AD:FE:CA  
          inet6 addr: fe80::fc01:a4ff:fead:feca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
--
vnet1     Link encap:Ethernet  HWaddr FE:01:A4:AD:FE:CB  
          inet6 addr: fe80::fc01:a4ff:fead:fecb/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

# touch /etc/kdump.conf

# service kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):
  
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-573.7.1.el6.x86_64kdump.img
The ifcfg-vnet0 or ifcfg-xxx which contains DEVICE=vnet0 field doesn't exist.
Failed to run mkdumprd
Starting kdump:                                            [FAILED]
Comment 3 Minfei Huang 2015-11-20 04:27:23 EST
(In reply to Julio Entrena Perez from comment #2)
> (In reply to Minfei Huang from comment #1)
> > 
> > Could you mind setuping ifcfg for device vnet0? Then you can try it again.
> 
> The interfaces are already UP since the VMs are running:
> 
> # ifconfig | grep -A2 vnet
> vnet0     Link encap:Ethernet  HWaddr FE:01:A4:AD:FE:CA  
>           inet6 addr: fe80::fc01:a4ff:fead:feca/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> --
> vnet1     Link encap:Ethernet  HWaddr FE:01:A4:AD:FE:CB  
>           inet6 addr: fe80::fc01:a4ff:fead:fecb/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> 

Yes, this net device is up. What I mean is to write a persistent config in ifcfg-x. Kdump will use it to gather enough info to setup it in 2nd kernel.

Thanks
Minfei
Comment 4 Julio Entrena Perez 2015-11-20 04:29:37 EST
(In reply to Julio Entrena Perez from comment #2)
> (In reply to Minfei Huang from comment #1)
> > 
> > Could you mind setuping ifcfg for device vnet0? Then you can try it again.
> 
Sorry, I misread your update.
vnet<x> interfaces are tap interfaces created for the vNICs of the running VMs.
Those interfaces are created and destroyed dynamically and consequently don't have an ifcfg file.
mkdumprd should ignore those interfaces.
Comment 5 Minfei Huang 2015-11-20 06:59:00 EST
(In reply to Julio Entrena Perez from comment #4)
> (In reply to Julio Entrena Perez from comment #2)
> > (In reply to Minfei Huang from comment #1)
> > > 
> > > Could you mind setuping ifcfg for device vnet0? Then you can try it again.
> > 
> Sorry, I misread your update.
> vnet<x> interfaces are tap interfaces created for the vNICs of the running
> VMs.
> Those interfaces are created and destroyed dynamically and consequently
> don't have an ifcfg file.
> mkdumprd should ignore those interfaces.

Seems kdump will use ifcfg to setup network? Could you have a try?

Thanks
Minfei
Comment 7 Baoquan He 2015-12-18 04:30:15 EST
Hi Cui Ying,

Could you help reproduce this bug on rhev host?
Comment 9 Xunlei Pang 2015-12-24 21:33:06 EST
Seems the ip route accquired from the destination is through the guest's vnet, I think this is wrong, it should be one of the host's netdev. 

Could you please paste the content of your /etc/kdump.conf plus the output of following command both on host?
ip route get to <dest ip>

As an example, if using ssh kdump to "10.66.129.152", so in /etc/kdump.conf like:
ssh root@10.66.129.152
sshkey /root/.ssh/kdumprsa

And the command output is like:
# ip route get to 10.66.129.152
10.66.129.152 via 10.16.47.254 dev eth0  src 10.16.45.13
    cache  mtu 1500 advmss 1460 hoplimit 64
Comment 10 Julio Entrena Perez 2015-12-29 07:37:19 EST
# cat /etc/kdump.conf | grep -v ^#

path /var/crash
core_collector makedumpfile -c --message-level 1 -d 31

Dump is to be produced locally so there's no <dest ip>.
Comment 11 Julio Entrena Perez 2015-12-29 07:44:09 EST
Apologies, wrong host.

# cat /etc/kdump.conf | grep -v ^#

path /var/crash
core_collector makedumpfile -c --message-level 1 -d 31
fence_kdump_nodes rhevm1-375.usersys.redhat.com
fence_kdump_args -p 7410 -i 5

Looks like including the "fence_kdump_nodes" line causes the problem.
Comment 14 Xunlei Pang 2015-12-29 22:09:37 EST
"rhevm" is a bridge, and it uses "eth1" for one of its physical interfaces. If I understand correctly, RHEV will create a new vnet<n> for every new guest<n> dynamically, and add "vnet<n>" to "rhevm" bridge, the kdump will simply fail when detecting no ifcfg-vnet<m> file created for the corresponding vnet<m> interface. 

I think for such bridges in RHEV, we can interpret only "eth1" for "rhevm" and ignore all the "vnet<m>" in the bridge, the bridge still works, and so does kdump on RHEV.
Comment 17 Xunlei Pang 2016-01-06 22:03:41 EST
Could you please help start some VMs on "10.33.20.24 (root/redhat)", currently there're no vnet interfaces on the host, I want to gather some information to make a formal patch.

Thanks!
Comment 18 Julio Entrena Perez 2016-01-07 04:20:22 EST
Sorry, done.
Comment 19 Xunlei Pang 2016-01-07 05:08:20 EST
For vnet<x>, by what ways can I identify it as a virtual interface assigned to VMs, like through some file under /sys/class/net/vnet0/ ? Anyone knows that?
Comment 20 Julio Entrena Perez 2016-01-07 05:37:18 EST
It should be safe to assume that any interface named vnet<x> is related to a virtual machine:

http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/conf/domain_conf.h;h=ae6d546978973539766ebb114e7be3a802c329fa;hb=HEAD#l1084
Comment 21 Xunlei Pang 2016-01-07 08:07:56 EST
This is very helpful, thank you!
Comment 23 Xunlei Pang 2016-01-12 04:36:10 EST
Hi,

On RHEV, is the bridge name "rhevm" also safe to use like vnet<n>?
Comment 24 Julio Entrena Perez 2016-01-12 04:43:44 EST
rhevm is the default management network on RHEV hosts, it is used to communicate to the RHEV-M (manager) and it will _usually_ have the main IP address of the host.
If a host has to dump a vmcore over the network, it's _likely_ that this is the interface that should be used for that purpose.
Comment 33 Guo, Zhiyi 2016-03-03 03:37:56 EST
(In reply to Xunlei Pang from comment #32)
> Hi Zhiyi,
> 
> It's a known issue, you can refer to the following link:
> https://bugzilla.redhat.com/show_bug.cgi?id=1284605
> 
> It's been fixed since Release 2.0.0-294 after this bug, so you try to test
> it 
> using newer release versions after "kexec-tools-2.0.0-294.el6.x86_64"
> 
> Thanks.

Thanks Xunlei,

Verify this bug on rhel 6.8 with kexec-tools-2.0.0-294.el6.x86_64:
[root@dhcp-10-61 ~]# service kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):
  
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-621.el6.x86_64kdump.img
Starting kdump:                                            [  OK  ]
Comment 36 errata-xmlrpc 2016-05-10 15:12:09 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0734.html
Comment 37 Billy Woods 2016-06-28 16:06:08 EDT
Requesting that the bug be re-opened:

After updating to the new kexec, the customer is still unable to bring up kdump. 

The customer can bring down the tap device and start kdump; however, after a yum update kernel then kdump goes down and the only work around is to bring down the tap device again. Case # 01620459

Note You need to log in before you can comment on or make changes to this bug.