Bug 1254406

Summary: guest can not generate vmcore file after trigger a crash in the guest
Product: Red Hat Enterprise Linux 7 Reporter: Yanan Fu <yfu>
Component: qemu-kvm-rhevAssignee: Radim Krčmář <rkrcmar>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: chayang, hhuang, juzhang, knoel, virt-maint, xfu, yfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-18 12:50:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
full console output when this bug reproduce none

Description Yanan Fu 2015-08-18 03:11:09 UTC
Description of problem:
Boot a guest, trigger a crash in guest by:"echo c > /proc/sysrq-trigger", wait for a moment. after guest reboot, no vmcore file exist.

Version-Release number of selected component (if applicable):
qemu-kvm:qemu-kvm-rhev-2.3.0-17.el7.x86_64
kernel:3.10.0-304.el7.x86_64

How reproducible:
sometimes.

Steps to Reproduce:
1.boot one guest, edit cmdline: crashkernel=128M, kdump service is runing.
2.trigger a crash in the guest: echo c > /proc/sysrq-trigger
3.after guest reboot,no vmcore file.

Actual results:
no vmcore file in the related path(define in "/etc/kdump.conf")

Expected results:
vmcore file exist, and can be opened with crash.

Additional info:
I have test with virtio-scis, ide, virtio-blk disk. and virtio-blk can reproduce this issue more easily.
CLI:
usr/libexec/qemu-kvm -name test -machine pc,accel=kvm,usb=off,dump-guest-core=on -m 2G -cpu SandyBridge -smp 2,sockets=2,cores=1,threads=1 -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device pci-bridge,bus=pci.0,id=bridge1,chassis_nr=1,addr=0x5 -device ich9-usb-ehci1,id=usb,bus=bridge1,addr=0x2.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=bridge1,multifunction=on,addr=0x2.0x0 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=bridge1,addr=0x2.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=bridge1,addr=0x2.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=bridge1,addr=0x4 -chardev socket,path=/tmp/yfu,server,nowait,id=yfu0 -netdev tap,id=hostnet,vhost=on -device virtio-net-pci,netdev=hostnet,id=net,mac=78:1a:4a:d6:b8:98,bus=bridge1,addr=0x6,bootindex=4 -vnc :1 -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=33554432 -msg timestamp=on -monitor unix:/home/qmp,server,nowait -serial unix:/tmp/ttyS0,server,nowait -qmp tcp:0:4445,server,nowait -device usb-tablet,id=usb-tablet1 -drive file=/root/RHEL-7.2-02.qcow2,if=none,id=drive-virtio-blk-disk0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-blk-disk0,id=virtio-blk-disk0,bootindex=1

Comment 2 Karen Noel 2015-08-18 13:26:21 UTC
Is this a regression from rhel 7.1?

Comment 3 Yanan Fu 2015-08-19 08:01:02 UTC
(In reply to Karen Noel from comment #2)
> Is this a regression from rhel 7.1?

with rhel 7.1 host, this issue still exist.
host version:
kernel:3.10.0-229.el7.x86_64
qemu:qemu-kvm-rhev-2.1.2-21.el7.x86_64

Comment 4 Karen Noel 2015-09-01 10:43:41 UTC
(In reply to Yanan Fu from comment #3)
> (In reply to Karen Noel from comment #2)
> > Is this a regression from rhel 7.1?
> 
> with rhel 7.1 host, this issue still exist.
> host version:
> kernel:3.10.0-229.el7.x86_64
> qemu:qemu-kvm-rhev-2.1.2-21.el7.x86_64

In your experiment with rhel 7.1 host, was the guest still rhel 7.2? If so, then it could a guest kernel issue. Can you try rhel 7.1 guest on rhel 7.2 host? And, both host and guest rhel 7.1?

Thnaks.

Comment 5 Yanan Fu 2015-09-02 02:20:53 UTC
(In reply to Karen Noel from comment #4)
> (In reply to Yanan Fu from comment #3)
> > (In reply to Karen Noel from comment #2)
> > > Is this a regression from rhel 7.1?
> > 
> > with rhel 7.1 host, this issue still exist.
> > host version:
> > kernel:3.10.0-229.el7.x86_64
> > qemu:qemu-kvm-rhev-2.1.2-21.el7.x86_64
> 
> In your experiment with rhel 7.1 host, was the guest still rhel 7.2? If so,
> then it could a guest kernel issue. Can you try rhel 7.1 guest on rhel 7.2
> host? And, both host and guest rhel 7.1?
> 
> Thnaks.

1.yes, in my experiment with rhel 7.1 host, guest was rhel 7.2.
2.rhel 7.1 guest on rhel 7.2 host, it is ok, guest can generate vmcore file successfully.
  host: kernel-3.10.0-309.el7.x86_64
  guest:kernel-3.10.0-229.el7.x86_64

Comment 6 Radim Krčmář 2015-09-16 19:27:14 UTC
What does "sometimes" mean in a $number_of_failures to $number_of_tries ratio?
Does it happen if you use crashkernel=auto?

Please provide full console output for the failing case.

Comment 7 Yanan Fu 2015-09-17 05:55:30 UTC
(In reply to Radim Krčmář from comment #6)
> What does "sometimes" mean in a $number_of_failures to $number_of_tries
> ratio?
> Does it happen if you use crashkernel=auto?
> 
> Please provide full console output for the failing case.

1.with my latest test, the probability is 100% now.
2. crashkernel=128M  in my test.
3. add full console output in the attachment(begin when do "echo c > /proc/sysrq-trigger"), please check.
   guest kernel:3.10.0-304.el7.x86_64
   host kernel: 3.10.0-316.el7.x86_64
        qemu-kvm:qemu-kvm-rhev-2.3.0-23.el7.x86_64

Comment 8 Yanan Fu 2015-09-17 06:02:34 UTC
Created attachment 1074276 [details]
full console output when this bug reproduce

Comment 9 Radim Krčmář 2015-09-17 12:01:49 UTC
*** Bug 1254409 has been marked as a duplicate of this bug. ***

Comment 10 Radim Krčmář 2015-09-17 12:08:54 UTC
Thanks.  The relevant part is:

[    2.375538] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
[  183.710106] dracut-initqueue[235]: Warning: Could not boot.
[  183.711412] dracut-initqueue[235]: Warning: /dev/disk/by-uuid/9a64b37a-fd56-48b5-a491-b5195d89a77c does not exist
[  183.713169] dracut-initqueue[235]: Warning: /dev/mapper/rhel_dhcp--66--72--83-root does not exist
[  183.714489] dracut-initqueue[235]: Warning: /dev/rhel_dhcp-66-72-83/root does not exist
[  183.716086] dracut-initqueue[235]: Warning: /dev/rhel_dhcp-66-72-83/swap does not exist

Dracut cannot find the disk, so the kdump boot fails.
I see that the other bug used the same image -- can you reproduce the bug if you
  # rm /boot/*kdump*; systemctl restart kdump
before the first `echo c > /proc/sysrq-trigger`?

Comment 11 Yanan Fu 2015-09-18 02:22:59 UTC
(In reply to Radim Krčmář from comment #10)
> Thanks.  The relevant part is:
> 
> [    2.375538] input: ImExPS/2 Generic Explorer Mouse as
> /devices/platform/i8042/serio1/input/input3
> [  183.710106] dracut-initqueue[235]: Warning: Could not boot.
> [  183.711412] dracut-initqueue[235]: Warning:
> /dev/disk/by-uuid/9a64b37a-fd56-48b5-a491-b5195d89a77c does not exist
> [  183.713169] dracut-initqueue[235]: Warning:
> /dev/mapper/rhel_dhcp--66--72--83-root does not exist
> [  183.714489] dracut-initqueue[235]: Warning: /dev/rhel_dhcp-66-72-83/root
> does not exist
> [  183.716086] dracut-initqueue[235]: Warning: /dev/rhel_dhcp-66-72-83/swap
> does not exist
> 
> Dracut cannot find the disk, so the kdump boot fails.
> I see that the other bug used the same image -- can you reproduce the bug if
> you
>   # rm /boot/*kdump*; systemctl restart kdump
> before the first `echo c > /proc/sysrq-trigger`?

After # rm /boot/*kdump*; systemctl restart kdump---->rebuild the file:/boot/initramfs-****kdump.img

"echo c > /proc/sysrq-trigger" can generate vmcore file successfully.

In my test before,there has a same file. Restart the kdump server ,rebuild the file, it is ok, why?

Comment 12 Radim Krčmář 2015-09-18 12:50:01 UTC
Information about the system is built into *kdump.img so re-using the same image for multiple machines is going to get unexpected results and possibly fail.

It's a minor bug (use-case is rare) and it's dracut's problem anyway so I'm closing it here,