Bug 1433854 - [kdump] guest does not reboot after dump
Summary: [kdump] guest does not reboot after dump
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kexec-tools
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Pingfan Liu
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-20 07:51 UTC by hachen
Modified: 2017-08-02 02:39 UTC (History)
13 users (show)

Fixed In Version: kexec-tools-2.0.14-7.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 09:33:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
serial log (41.85 KB, image/png)
2017-03-20 07:51 UTC, hachen
no flags Details
serial log with -serial stdio (36.39 KB, text/plain)
2017-03-21 02:09 UTC, hachen
no flags Details
log monitor stdio (41.25 KB, image/png)
2017-03-21 02:10 UTC, hachen
no flags Details
qemu 2.6.0.27 serial log (36.12 KB, text/plain)
2017-03-21 03:15 UTC, hachen
no flags Details
qemu-kvm-rhev-2.6.0-27.el7.x86_66 monitor stdio log (67.84 KB, image/png)
2017-03-21 03:16 UTC, hachen
no flags Details
/etc/kdump.conf (6.76 KB, text/plain)
2017-03-21 03:24 UTC, hachen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2300 0 normal SHIPPED_LIVE kexec-tools bug fix and enhancement update 2017-08-01 12:40:58 UTC

Description hachen 2017-03-20 07:51:55 UTC
Created attachment 1264708 [details]
serial log

Description of problem:
guest does not reboot after kdump

Version-Release number of selected component (if applicable):

HOST:
kernel-3.10.0-606.el7.x86_64
kernel-debuginfo-3.10.0-606.el7.x86_64
kernel-debuginfo-common-x86_64-3.10.0-606.el7.x86_64
qemu-kvm-rhev-2.8.0-5.el7.x86_64

GUEST:
kernel-3.10.0-606.el7.x86_64
kexec-tools 2.0.14 

How reproducible: 100%


Steps to Reproduce:
1.boot up guest
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel73-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
    -device virtio-net-pci,mac=9a:4d:4e:4f:50:51,id=id3DveCw,vectors=4,netdev=idgW5YRp,bus=pci.0,addr=05  \
    -netdev tap,id=idgW5YRp \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:localhost:4444,server,nowait\

2.In guest /etc/kdump.conf,default action is set to:
default reboot 

3.In guest /etc/default/grub:
GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root
rhgb quiet

#service kdump start

4.trigger a crash in the guest
# echo c >/proc/sysrq-trigger


Actual results:
The rhel guest hangs with the black screen.


Expected results:
The guest dumps and reboot.


Additional info:
Due to network issue, I will upload serial log file later.

Comment 2 hachen 2017-03-21 02:09:43 UTC
Created attachment 1264893 [details]
serial log with -serial stdio

When I add "-serial stdio" in the qemu cmd, the guest takes dump and reboot.
But if I don't, it will hangs at the black screen.

Comment 3 hachen 2017-03-21 02:10:44 UTC
Created attachment 1264894 [details]
log monitor stdio

Comment 4 hachen 2017-03-21 03:15:26 UTC
Created attachment 1264900 [details]
qemu 2.6.0.27 serial log

I tried with qemu-kvm-rhev-2.6.0-27.el7.x86_64.
If I use "-serial stdio" in qemu cmd, the guest takes dump and reboot.
if I use "-monitor stdio" in qemu cmd, the guest hangs at the black screen.

Comment 5 hachen 2017-03-21 03:16:24 UTC
Created attachment 1264901 [details]
qemu-kvm-rhev-2.6.0-27.el7.x86_66 monitor stdio log

Comment 6 hachen 2017-03-21 03:24:13 UTC
Created attachment 1264902 [details]
/etc/kdump.conf

I tested with same kernel and configurations but different qemu versions:

1.With qemu-kvm-rhev-2.8.0-6.el7,for rhel guests:

1.1 dump coredump in guest /var/crash.
	I triggered a crash using # echo c >/proc/sysrq-trigger
	The rhel guest hangs with the black screen. (It should take a dump and reboot)

1.2 guest kdump over ssh.
	edit /etc/kdump.conf:
	 ssh root.73.85  <-- host ip
	 sshkey /root/.ssh/id_rsa
	 path /var/crash
	 core_collector makedumpfile -F -l --message-level 1 -d 31
	 default reboot
	# service kdump start
        
       Then,I triggered a crash using # echo c >/proc/sysrq-trigger.
       The rhel guest hangs with the black screen. (It should take a dump and reboot)

2.With qemu-kvm-rhev-2.6.0-27.el7,for rhel guests:

2.1 dump coredump in guest /var/crash.
	I triggered a crash using # echo c >/proc/sysrq-trigger
	The rhel guest hangs with the black screen. (It should take a dump and reboot)

2.2 guest kdump over ssh.
	edit /etc/kdump.conf:
	 ssh root.73.85  <-- host ip
	 sshkey /root/.ssh/id_rsa
	 path /var/crash
	 core_collector makedumpfile -F -l --message-level 1 -d 31
	 default reboot
	# service kdump start
        
       Then,I triggered a crash using # echo c >/proc/sysrq-trigger.
       The rhel guest takes a dump and reboot.

***I suspect it is a qemu bug***

Comment 7 David Hildenbrand 2017-03-23 09:24:58 UTC
After editing /etc/kdump.conf, you have to (re)start kdump.

kdump will then regenerate the initrd, packaging the updated version of /etc/kdump.conf. I assume that this was done in your case.

However I wonder if there is a general problem. I set it to "default shell", restarted kdump and made sure that the updated config file ended up in the initrd. There was no way of stopping kdump of rebooting the guest. The default parameter just got ignored.

Comment 8 Pingfan Liu 2017-03-24 05:32:06 UTC
Hi hachen.

I used the same qemu version and guest kernel/kexec-tools as you reported. But fail to use the following cmd to reproduce the bug.(The cmd is copied from yours except that the network-config)

Since guest hangs with the black screen, could you test with the following step:
-1. insert "gdb --args" before your cmdline, 
-2. set breakpoint by "break pc_machine_reset"
-3. run
When the guest boot up, gdb will hit the breakpoint, you can ignore it.
But after you "echo c > /proc/sysrq-trigger", please notice whether the breakpoint is hit or not.

I will do further analysis and debug based on the result 

Thx,
Pingfan 


--- cmd I used ---
gdb --args \
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=raw,file=$guest_img \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
    -net nic,model=virtio,macaddr=$(< /sys/class/net/macvtap0/address) \
    -net tap,fd=3 3<>/dev/tap$(< /sys/class/net/macvtap0/ifindex) \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :2  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio



pc_machine_reset

Comment 9 Pingfan Liu 2017-03-24 09:02:56 UTC
login to the buggy system, I find that the 2nd kernel does not boot up.
Also the gdb can not hit the breakpoint pc_machine_reset for the 2nd time.
It is strange, need more time to debug.

Thx,
Pingfan

Comment 11 Qiao Zhao 2017-05-22 06:43:55 UTC
Hi hechen,

Could you help to retest this problem by fixed package kexec-tools-2.0.14-7.el7?

--
Thanks,
Qiao

Comment 12 hachen 2017-05-24 06:51:54 UTC
I test on
host:
kernel-3.10.0-656.el7.x86_64
kernel-debuginfo-3.10.0-656.el7.x86_64
kernel-debuginfo-common-x86_64-3.10.0-656.el7.x86_64
kexec-tools-2.0.14-7.el7
qemu-kvm-rhev-2.9.0-5.el7.x86_64

guest:
kernel-3.10.0-656.el7.x86_64
kexec-tools-2.0.14-7.el7

It works as the guest reboot after dump.

Comment 13 Qiao Zhao 2017-05-31 08:45:20 UTC
(In reply to hachen from comment #12)
> I test on
> host:
> kernel-3.10.0-656.el7.x86_64
> kernel-debuginfo-3.10.0-656.el7.x86_64
> kernel-debuginfo-common-x86_64-3.10.0-656.el7.x86_64
> kexec-tools-2.0.14-7.el7
> qemu-kvm-rhev-2.9.0-5.el7.x86_64
> 
> guest:
> kernel-3.10.0-656.el7.x86_64
> kexec-tools-2.0.14-7.el7
> 
> It works as the guest reboot after dump.

Thanks! I really appreciate it. 

Move to Verified.

--
Thanks,
Qiao

Comment 14 Pingfan Liu 2017-08-01 03:23:20 UTC
(In reply to hachen from comment #4)
> Created attachment 1264900 [details]
> qemu 2.6.0.27 serial log
> 
> I tried with qemu-kvm-rhev-2.6.0-27.el7.x86_64.
> If I use "-serial stdio" in qemu cmd, the guest takes dump and reboot.
> if I use "-monitor stdio" in qemu cmd, the guest hangs at the black screen.

I think that in description, you miss something for the kernel cmdline. In it, you used "console=tty0 console=ttyS0", so when you tried qemu without "-serial stdio"(i.e. the VM does not implement serial device), the kdump failed.

Regards,
Pingfan

Comment 15 errata-xmlrpc 2017-08-01 09:33:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2300

Comment 16 hachen 2017-08-02 02:39:00 UTC
(In reply to Pingfan Liu from comment #14)
> (In reply to hachen from comment #4)
> > Created attachment 1264900 [details]
> > qemu 2.6.0.27 serial log
> > 
> > I tried with qemu-kvm-rhev-2.6.0-27.el7.x86_64.
> > If I use "-serial stdio" in qemu cmd, the guest takes dump and reboot.
> > if I use "-monitor stdio" in qemu cmd, the guest hangs at the black screen.
> 
> I think that in description, you miss something for the kernel cmdline. In
> it, you used "console=tty0 console=ttyS0", so when you tried qemu without
> "-serial stdio"(i.e. the VM does not implement serial device), the kdump
> failed.
> 
> Regards,
> Pingfan


In comment #2, when I commented I added "-serial stdio", I think it was someone asked for the serial log.

In my test cases, I normally use "-monitor stdio" as I posted in the Description.

The first time I report this bug was using "-monitor stdio" to boot a guest.
then run # service kdump start,
next run # echo c > /proc/sysrq-trigger to trigger dump.
At that time, the guest  hangs with the black screen.

The "GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root
rhgb quiet" was served as additional information there, I did not change anything.

After its fix, when I follow the same step, the guest will reboot.

Hope this make this bug clear.
Thanks
Haotong


Note You need to log in before you can comment on or make changes to this bug.