Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1433854

Summary: [kdump] guest does not reboot after dump
Product: Red Hat Enterprise Linux 7 Reporter: hachen <hachen>
Component: kexec-toolsAssignee: Pingfan Liu <piliu>
Status: CLOSED ERRATA QA Contact: Qiao Zhao <qzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: chayang, coli, david, dhildenb, dyoung, hachen, jasowang, juzhang, mdeng, michen, qzhao, ruyang, xiawu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kexec-tools-2.0.14-7.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 09:33:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
serial log
none
serial log with -serial stdio
none
log monitor stdio
none
qemu 2.6.0.27 serial log
none
qemu-kvm-rhev-2.6.0-27.el7.x86_66 monitor stdio log
none
/etc/kdump.conf none

Description hachen 2017-03-20 07:51:55 UTC
Created attachment 1264708 [details]
serial log

Description of problem:
guest does not reboot after kdump

Version-Release number of selected component (if applicable):

HOST:
kernel-3.10.0-606.el7.x86_64
kernel-debuginfo-3.10.0-606.el7.x86_64
kernel-debuginfo-common-x86_64-3.10.0-606.el7.x86_64
qemu-kvm-rhev-2.8.0-5.el7.x86_64

GUEST:
kernel-3.10.0-606.el7.x86_64
kexec-tools 2.0.14 

How reproducible: 100%


Steps to Reproduce:
1.boot up guest
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel73-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
    -device virtio-net-pci,mac=9a:4d:4e:4f:50:51,id=id3DveCw,vectors=4,netdev=idgW5YRp,bus=pci.0,addr=05  \
    -netdev tap,id=idgW5YRp \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:localhost:4444,server,nowait\

2.In guest /etc/kdump.conf,default action is set to:
default reboot 

3.In guest /etc/default/grub:
GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root
rhgb quiet

#service kdump start

4.trigger a crash in the guest
# echo c >/proc/sysrq-trigger


Actual results:
The rhel guest hangs with the black screen.


Expected results:
The guest dumps and reboot.


Additional info:
Due to network issue, I will upload serial log file later.

Comment 2 hachen 2017-03-21 02:09:43 UTC
Created attachment 1264893 [details]
serial log with -serial stdio

When I add "-serial stdio" in the qemu cmd, the guest takes dump and reboot.
But if I don't, it will hangs at the black screen.

Comment 3 hachen 2017-03-21 02:10:44 UTC
Created attachment 1264894 [details]
log monitor stdio

Comment 4 hachen 2017-03-21 03:15:26 UTC
Created attachment 1264900 [details]
qemu 2.6.0.27 serial log

I tried with qemu-kvm-rhev-2.6.0-27.el7.x86_64.
If I use "-serial stdio" in qemu cmd, the guest takes dump and reboot.
if I use "-monitor stdio" in qemu cmd, the guest hangs at the black screen.

Comment 5 hachen 2017-03-21 03:16:24 UTC
Created attachment 1264901 [details]
qemu-kvm-rhev-2.6.0-27.el7.x86_66 monitor stdio log

Comment 6 hachen 2017-03-21 03:24:13 UTC
Created attachment 1264902 [details]
/etc/kdump.conf

I tested with same kernel and configurations but different qemu versions:

1.With qemu-kvm-rhev-2.8.0-6.el7,for rhel guests:

1.1 dump coredump in guest /var/crash.
	I triggered a crash using # echo c >/proc/sysrq-trigger
	The rhel guest hangs with the black screen. (It should take a dump and reboot)

1.2 guest kdump over ssh.
	edit /etc/kdump.conf:
	 ssh root.73.85  <-- host ip
	 sshkey /root/.ssh/id_rsa
	 path /var/crash
	 core_collector makedumpfile -F -l --message-level 1 -d 31
	 default reboot
	# service kdump start
        
       Then,I triggered a crash using # echo c >/proc/sysrq-trigger.
       The rhel guest hangs with the black screen. (It should take a dump and reboot)

2.With qemu-kvm-rhev-2.6.0-27.el7,for rhel guests:

2.1 dump coredump in guest /var/crash.
	I triggered a crash using # echo c >/proc/sysrq-trigger
	The rhel guest hangs with the black screen. (It should take a dump and reboot)

2.2 guest kdump over ssh.
	edit /etc/kdump.conf:
	 ssh root.73.85  <-- host ip
	 sshkey /root/.ssh/id_rsa
	 path /var/crash
	 core_collector makedumpfile -F -l --message-level 1 -d 31
	 default reboot
	# service kdump start
        
       Then,I triggered a crash using # echo c >/proc/sysrq-trigger.
       The rhel guest takes a dump and reboot.

***I suspect it is a qemu bug***

Comment 7 David Hildenbrand 2017-03-23 09:24:58 UTC
After editing /etc/kdump.conf, you have to (re)start kdump.

kdump will then regenerate the initrd, packaging the updated version of /etc/kdump.conf. I assume that this was done in your case.

However I wonder if there is a general problem. I set it to "default shell", restarted kdump and made sure that the updated config file ended up in the initrd. There was no way of stopping kdump of rebooting the guest. The default parameter just got ignored.

Comment 8 Pingfan Liu 2017-03-24 05:32:06 UTC
Hi hachen.

I used the same qemu version and guest kernel/kexec-tools as you reported. But fail to use the following cmd to reproduce the bug.(The cmd is copied from yours except that the network-config)

Since guest hangs with the black screen, could you test with the following step:
-1. insert "gdb --args" before your cmdline, 
-2. set breakpoint by "break pc_machine_reset"
-3. run
When the guest boot up, gdb will hit the breakpoint, you can ignore it.
But after you "echo c > /proc/sysrq-trigger", please notice whether the breakpoint is hit or not.

I will do further analysis and debug based on the result 

Thx,
Pingfan 


--- cmd I used ---
gdb --args \
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=raw,file=$guest_img \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
    -net nic,model=virtio,macaddr=$(< /sys/class/net/macvtap0/address) \
    -net tap,fd=3 3<>/dev/tap$(< /sys/class/net/macvtap0/ifindex) \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :2  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio



pc_machine_reset

Comment 9 Pingfan Liu 2017-03-24 09:02:56 UTC
login to the buggy system, I find that the 2nd kernel does not boot up.
Also the gdb can not hit the breakpoint pc_machine_reset for the 2nd time.
It is strange, need more time to debug.

Thx,
Pingfan

Comment 11 Qiao Zhao 2017-05-22 06:43:55 UTC
Hi hechen,

Could you help to retest this problem by fixed package kexec-tools-2.0.14-7.el7?

--
Thanks,
Qiao

Comment 12 hachen 2017-05-24 06:51:54 UTC
I test on
host:
kernel-3.10.0-656.el7.x86_64
kernel-debuginfo-3.10.0-656.el7.x86_64
kernel-debuginfo-common-x86_64-3.10.0-656.el7.x86_64
kexec-tools-2.0.14-7.el7
qemu-kvm-rhev-2.9.0-5.el7.x86_64

guest:
kernel-3.10.0-656.el7.x86_64
kexec-tools-2.0.14-7.el7

It works as the guest reboot after dump.

Comment 13 Qiao Zhao 2017-05-31 08:45:20 UTC
(In reply to hachen from comment #12)
> I test on
> host:
> kernel-3.10.0-656.el7.x86_64
> kernel-debuginfo-3.10.0-656.el7.x86_64
> kernel-debuginfo-common-x86_64-3.10.0-656.el7.x86_64
> kexec-tools-2.0.14-7.el7
> qemu-kvm-rhev-2.9.0-5.el7.x86_64
> 
> guest:
> kernel-3.10.0-656.el7.x86_64
> kexec-tools-2.0.14-7.el7
> 
> It works as the guest reboot after dump.

Thanks! I really appreciate it. 

Move to Verified.

--
Thanks,
Qiao

Comment 14 Pingfan Liu 2017-08-01 03:23:20 UTC
(In reply to hachen from comment #4)
> Created attachment 1264900 [details]
> qemu 2.6.0.27 serial log
> 
> I tried with qemu-kvm-rhev-2.6.0-27.el7.x86_64.
> If I use "-serial stdio" in qemu cmd, the guest takes dump and reboot.
> if I use "-monitor stdio" in qemu cmd, the guest hangs at the black screen.

I think that in description, you miss something for the kernel cmdline. In it, you used "console=tty0 console=ttyS0", so when you tried qemu without "-serial stdio"(i.e. the VM does not implement serial device), the kdump failed.

Regards,
Pingfan

Comment 15 errata-xmlrpc 2017-08-01 09:33:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2300

Comment 16 hachen 2017-08-02 02:39:00 UTC
(In reply to Pingfan Liu from comment #14)
> (In reply to hachen from comment #4)
> > Created attachment 1264900 [details]
> > qemu 2.6.0.27 serial log
> > 
> > I tried with qemu-kvm-rhev-2.6.0-27.el7.x86_64.
> > If I use "-serial stdio" in qemu cmd, the guest takes dump and reboot.
> > if I use "-monitor stdio" in qemu cmd, the guest hangs at the black screen.
> 
> I think that in description, you miss something for the kernel cmdline. In
> it, you used "console=tty0 console=ttyS0", so when you tried qemu without
> "-serial stdio"(i.e. the VM does not implement serial device), the kdump
> failed.
> 
> Regards,
> Pingfan


In comment #2, when I commented I added "-serial stdio", I think it was someone asked for the serial log.

In my test cases, I normally use "-monitor stdio" as I posted in the Description.

The first time I report this bug was using "-monitor stdio" to boot a guest.
then run # service kdump start,
next run # echo c > /proc/sysrq-trigger to trigger dump.
At that time, the guest  hangs with the black screen.

The "GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root
rhgb quiet" was served as additional information there, I did not change anything.

After its fix, when I follow the same step, the guest will reboot.

Hope this make this bug clear.
Thanks
Haotong