Bug 1254406 - guest can not generate vmcore file after trigger a crash in the guest
guest can not generate vmcore file after trigger a crash in the guest
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Radim Krčmář
Virtualization Bugs
:
: 1254409 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-17 23:11 EDT by Yanan Fu
Modified: 2016-03-28 05:18 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-09-18 08:50:01 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
full console output when this bug reproduce (87.28 KB, text/plain)
2015-09-17 02:02 EDT, Yanan Fu
no flags Details

  None (edit)
Description Yanan Fu 2015-08-17 23:11:09 EDT
Description of problem:
Boot a guest, trigger a crash in guest by:"echo c > /proc/sysrq-trigger", wait for a moment. after guest reboot, no vmcore file exist.

Version-Release number of selected component (if applicable):
qemu-kvm:qemu-kvm-rhev-2.3.0-17.el7.x86_64
kernel:3.10.0-304.el7.x86_64

How reproducible:
sometimes.

Steps to Reproduce:
1.boot one guest, edit cmdline: crashkernel=128M, kdump service is runing.
2.trigger a crash in the guest: echo c > /proc/sysrq-trigger
3.after guest reboot,no vmcore file.

Actual results:
no vmcore file in the related path(define in "/etc/kdump.conf")

Expected results:
vmcore file exist, and can be opened with crash.

Additional info:
I have test with virtio-scis, ide, virtio-blk disk. and virtio-blk can reproduce this issue more easily.
CLI:
usr/libexec/qemu-kvm -name test -machine pc,accel=kvm,usb=off,dump-guest-core=on -m 2G -cpu SandyBridge -smp 2,sockets=2,cores=1,threads=1 -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device pci-bridge,bus=pci.0,id=bridge1,chassis_nr=1,addr=0x5 -device ich9-usb-ehci1,id=usb,bus=bridge1,addr=0x2.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=bridge1,multifunction=on,addr=0x2.0x0 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=bridge1,addr=0x2.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=bridge1,addr=0x2.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=bridge1,addr=0x4 -chardev socket,path=/tmp/yfu,server,nowait,id=yfu0 -netdev tap,id=hostnet,vhost=on -device virtio-net-pci,netdev=hostnet,id=net,mac=78:1a:4a:d6:b8:98,bus=bridge1,addr=0x6,bootindex=4 -vnc :1 -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=33554432 -msg timestamp=on -monitor unix:/home/qmp,server,nowait -serial unix:/tmp/ttyS0,server,nowait -qmp tcp:0:4445,server,nowait -device usb-tablet,id=usb-tablet1 -drive file=/root/RHEL-7.2-02.qcow2,if=none,id=drive-virtio-blk-disk0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-blk-disk0,id=virtio-blk-disk0,bootindex=1
Comment 2 Karen Noel 2015-08-18 09:26:21 EDT
Is this a regression from rhel 7.1?
Comment 3 Yanan Fu 2015-08-19 04:01:02 EDT
(In reply to Karen Noel from comment #2)
> Is this a regression from rhel 7.1?

with rhel 7.1 host, this issue still exist.
host version:
kernel:3.10.0-229.el7.x86_64
qemu:qemu-kvm-rhev-2.1.2-21.el7.x86_64
Comment 4 Karen Noel 2015-09-01 06:43:41 EDT
(In reply to Yanan Fu from comment #3)
> (In reply to Karen Noel from comment #2)
> > Is this a regression from rhel 7.1?
> 
> with rhel 7.1 host, this issue still exist.
> host version:
> kernel:3.10.0-229.el7.x86_64
> qemu:qemu-kvm-rhev-2.1.2-21.el7.x86_64

In your experiment with rhel 7.1 host, was the guest still rhel 7.2? If so, then it could a guest kernel issue. Can you try rhel 7.1 guest on rhel 7.2 host? And, both host and guest rhel 7.1?

Thnaks.
Comment 5 Yanan Fu 2015-09-01 22:20:53 EDT
(In reply to Karen Noel from comment #4)
> (In reply to Yanan Fu from comment #3)
> > (In reply to Karen Noel from comment #2)
> > > Is this a regression from rhel 7.1?
> > 
> > with rhel 7.1 host, this issue still exist.
> > host version:
> > kernel:3.10.0-229.el7.x86_64
> > qemu:qemu-kvm-rhev-2.1.2-21.el7.x86_64
> 
> In your experiment with rhel 7.1 host, was the guest still rhel 7.2? If so,
> then it could a guest kernel issue. Can you try rhel 7.1 guest on rhel 7.2
> host? And, both host and guest rhel 7.1?
> 
> Thnaks.

1.yes, in my experiment with rhel 7.1 host, guest was rhel 7.2.
2.rhel 7.1 guest on rhel 7.2 host, it is ok, guest can generate vmcore file successfully.
  host: kernel-3.10.0-309.el7.x86_64
  guest:kernel-3.10.0-229.el7.x86_64
Comment 6 Radim Krčmář 2015-09-16 15:27:14 EDT
What does "sometimes" mean in a $number_of_failures to $number_of_tries ratio?
Does it happen if you use crashkernel=auto?

Please provide full console output for the failing case.
Comment 7 Yanan Fu 2015-09-17 01:55:30 EDT
(In reply to Radim Krčmář from comment #6)
> What does "sometimes" mean in a $number_of_failures to $number_of_tries
> ratio?
> Does it happen if you use crashkernel=auto?
> 
> Please provide full console output for the failing case.

1.with my latest test, the probability is 100% now.
2. crashkernel=128M  in my test.
3. add full console output in the attachment(begin when do "echo c > /proc/sysrq-trigger"), please check.
   guest kernel:3.10.0-304.el7.x86_64
   host kernel: 3.10.0-316.el7.x86_64
        qemu-kvm:qemu-kvm-rhev-2.3.0-23.el7.x86_64
Comment 8 Yanan Fu 2015-09-17 02:02:34 EDT
Created attachment 1074276 [details]
full console output when this bug reproduce
Comment 9 Radim Krčmář 2015-09-17 08:01:49 EDT
*** Bug 1254409 has been marked as a duplicate of this bug. ***
Comment 10 Radim Krčmář 2015-09-17 08:08:54 EDT
Thanks.  The relevant part is:

[    2.375538] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
[  183.710106] dracut-initqueue[235]: Warning: Could not boot.
[  183.711412] dracut-initqueue[235]: Warning: /dev/disk/by-uuid/9a64b37a-fd56-48b5-a491-b5195d89a77c does not exist
[  183.713169] dracut-initqueue[235]: Warning: /dev/mapper/rhel_dhcp--66--72--83-root does not exist
[  183.714489] dracut-initqueue[235]: Warning: /dev/rhel_dhcp-66-72-83/root does not exist
[  183.716086] dracut-initqueue[235]: Warning: /dev/rhel_dhcp-66-72-83/swap does not exist

Dracut cannot find the disk, so the kdump boot fails.
I see that the other bug used the same image -- can you reproduce the bug if you
  # rm /boot/*kdump*; systemctl restart kdump
before the first `echo c > /proc/sysrq-trigger`?
Comment 11 Yanan Fu 2015-09-17 22:22:59 EDT
(In reply to Radim Krčmář from comment #10)
> Thanks.  The relevant part is:
> 
> [    2.375538] input: ImExPS/2 Generic Explorer Mouse as
> /devices/platform/i8042/serio1/input/input3
> [  183.710106] dracut-initqueue[235]: Warning: Could not boot.
> [  183.711412] dracut-initqueue[235]: Warning:
> /dev/disk/by-uuid/9a64b37a-fd56-48b5-a491-b5195d89a77c does not exist
> [  183.713169] dracut-initqueue[235]: Warning:
> /dev/mapper/rhel_dhcp--66--72--83-root does not exist
> [  183.714489] dracut-initqueue[235]: Warning: /dev/rhel_dhcp-66-72-83/root
> does not exist
> [  183.716086] dracut-initqueue[235]: Warning: /dev/rhel_dhcp-66-72-83/swap
> does not exist
> 
> Dracut cannot find the disk, so the kdump boot fails.
> I see that the other bug used the same image -- can you reproduce the bug if
> you
>   # rm /boot/*kdump*; systemctl restart kdump
> before the first `echo c > /proc/sysrq-trigger`?

After # rm /boot/*kdump*; systemctl restart kdump---->rebuild the file:/boot/initramfs-****kdump.img

"echo c > /proc/sysrq-trigger" can generate vmcore file successfully.

In my test before,there has a same file. Restart the kdump server ,rebuild the file, it is ok, why?
Comment 12 Radim Krčmář 2015-09-18 08:50:01 EDT
Information about the system is built into *kdump.img so re-using the same image for multiple machines is going to get unexpected results and possibly fail.

It's a minor bug (use-case is rare) and it's dracut's problem anyway so I'm closing it here,

Note You need to log in before you can comment on or make changes to this bug.