Bug 1002015

Summary: use 'echo c > /proc/sysrq-trigger' to trigger guest crash which cause guest hang and fail to reboot
Product: Red Hat Enterprise Linux 6 Reporter: Sibiao Luo <sluo>
Component: qemu-kvmAssignee: Radim Krčmář <rkrcmar>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: acathrow, bsarathy, chayang, flang, juzhang, michen, mkenneth, qzhang, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-20 12:25:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
guest kernel logs. none

Description Sibiao Luo 2013-08-28 10:43:56 UTC
Description of problem:
use 'echo c > /proc/sysrq-trigger' to trigger guest crash, after guest call trace but then cause guest hang and fail to reboot.

Version-Release number of selected component (if applicable):
host info:
# uname -r && rpm -q qemu-kvm
2.6.32-413.el6.x86_64
qemu-kvm-0.12.1.2-2.398.el6.x86_64
guest info:
kernel-2.6.32-413.el6.x86_64
kernel-2.6.32-413.el6.x86_64

How reproducible:
2/2

Steps to Reproduce:
1.boot up a rhel guest.
# /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -name sluo -uuid 43425b70-86e5-4664-bf2c-3b76699b8bec -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm.1,bus=virtio-serial0.0,id=port1,nr=1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm.2,bus=virtio-serial0.0,id=port2,nr=2 -drive file=/home/RHEL-Server-6.4-64.qcow2,if=none,id=drive-system-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,serial="QEMU-DISK1" -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-system-disk,id=system-disk,bootindex=1 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x5 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev tap,id=hostnet0,vhost=off,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=2C:41:38:B6:40:21,bus=pci.0,addr=0x6,bootindex=2 -k en-us -boot menu=on -qmp tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :1 -spice port=5931,disable-ticketing -monitor stdio
2.login guest and check it.
# dmesg | grep -i crash
Command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=auto rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
Reserving 129MB of memory at 48MB for crashkernel (System RAM: 4608MB)
Kernel command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=129M@0M rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
crash memory driver: version 1.1

3.send 'echo c > /proc/sysrq-trigger' in guest.

Actual results:
after step 3, guest hang there and fail to reboot. I will attach the guest kernel log later.

Expected results:
it should call trace and generate vmcorem and reboot guest successfully.

Additional info:

Comment 1 Sibiao Luo 2013-08-28 10:45:29 UTC
Created attachment 791294 [details]
guest kernel logs.

Comment 2 Sibiao Luo 2013-08-28 11:16:52 UTC
(In reply to Sibiao Luo from comment #0)
> Description of problem:
> use 'echo c > /proc/sysrq-trigger' to trigger guest crash, after guest call
> trace but then cause guest hang and fail to reboot.
> 
> Version-Release number of selected component (if applicable):
> host info:
> # uname -r && rpm -q qemu-kvm
> 2.6.32-413.el6.x86_64
> qemu-kvm-0.12.1.2-2.398.el6.x86_64
> guest info:
> kernel-2.6.32-413.el6.x86_64
> kernel-2.6.32-413.el6.x86_64
> 
> How reproducible:
> 2/2
> 
> Steps to Reproduce:
> 1.boot up a rhel guest.
> # /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -enable-kvm -m 4096
> -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -name sluo -uuid
> 43425b70-86e5-4664-bf2c-3b76699b8bec -rtc
> base=localtime,clock=host,driftfix=slew -device
> virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,
> addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait
> -device
> virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm.1,bus=virtio-
> serial0.0,id=port1,nr=1 -chardev
> socket,id=channel2,path=/tmp/helloworld2,server,nowait -device
> virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm.2,bus=virtio-
> serial0.0,id=port2,nr=2 -drive
> file=/home/RHEL-Server-6.4-64.qcow2,if=none,id=drive-system-disk,
> format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,serial="QEMU-
> DISK1" -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-system-disk,id=system-
> disk,bootindex=1 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x5
> -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev
> tap,id=hostnet0,vhost=off,script=/etc/qemu-ifup -device
> virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=2C:41:38:B6:40:21,
> bus=pci.0,addr=0x6,bootindex=2 -k en-us -boot menu=on -qmp
> tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :1
> -spice port=5931,disable-ticketing -monitor stdio
> 2.login guest and check it.
> # dmesg | grep -i crash
> Command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS
> console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD
> SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=auto
> rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
> Reserving 129MB of memory at 48MB for crashkernel (System RAM: 4608MB)
> Kernel command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS
> console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD
> SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=129M@0M
> rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
> crash memory driver: version 1.1
# service kdump start
Kdump already running                                      [  OK  ]
# service kdump status
Kdump is operational
> 3.send 'echo c > /proc/sysrq-trigger' in guest.
> 
add one step between step 2 and 3 that start kdump service in guest and retest it that still hit this issue that guest hang there and fail to reboot.

Comment 3 langfang 2013-08-29 00:11:56 UTC
Not hit hit the problem on rhel6.4.z guest ,"echo c > /proc/sysrq-trigger "this comand work well , but will hit the problem when i trigger a crash use " taskset -c 1 echo c > /proc/sysrq-trigger"

Version:
Host:
# uname -r
2.6.32-414.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.398.el6.x86_64
# rpm -q seabios
seabios-0.6.1.2-28.el6.x86_64

Guest:
 2.6.32-358.20.1.el6.x86_64 

Steps:
1.Boot a guest
 /usr/libexec/qemu-kvm -name RHEL6.4.z -M rhel6.5.0 -m 4G -realtime mlock=off -smp 2,maxcpus=8 -uuid f15b9d0c-559d-47fa-a04f-08886831f4ef -nodefconfig -nodefaults -monitor stdio -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/root/rhel6.4-z-64.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=drive-virtio-0-0,id=drive-scsi,bootindex=1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:9e:3e:ea,bus=pci.0,addr=0x6 -device usb-tablet,id=input0 -spice port=8000,disable-ticketing -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -drive file=/root/RHEL6.4-20130130.0-Server-x86_64-DVD1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,id=ide0-1-0 -boot menu=on -qmp tcp:0:5555,server,nowait -serial unix:/tmp/tty0,server,nowait

2.In guest, make sure kdump is configured.

    #dmesg  | grep crashkernel
dmesg |grep crashkernel
Command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=VolGroup/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=tty0 console=ttyS0,115200 rhgb quiet
Reserving 128MB of memory at 48MB for crashkernel (System RAM: 4608MB)
Kernel command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=VolGroup/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=tty0 console=ttyS0,115200 rhgb quiet

3. trigger a crash :

   # taskset -c 1 echo c > /proc/sysrq-trigger

Resutls:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
PGD 117973067 PUD 11a3dd067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host1/target1:0:0/1:0:0:0/evt_media_change
CPU 1 
Modules linked in: nls_utf8 fuse autofs4 sunrpc 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode sg virtio_balloon virtio_net virtio_console i2c_piix4 i2c_core ext4 jbd2 mbcache sd_mod crc_t10dif virtio_scsi sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 3434, comm: echo Not tainted 2.6.32-358.20.1.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff8133dc76>]  [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
RSP: 0018:ffff88011793de18  EFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff88011793de18 R08: ffffffff81c07800 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81afff20 R14: 0000000000000286 R15: 0000000000000004
FS:  00007fce2e1cb700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001179cf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process echo (pid: 3434, threadinfo ffff88011793c000, task ffff8801178dd500)
Stack:
 ffff88011793de68 ffffffff8133df32 ffff8801178dd500 ffff880100000000
<d> 0000000000000022 0000000000000002 ffff88011cdb3780 00007fce2e1d9000
<d> 0000000000000002 fffffffffffffffb ffff88011793de98 ffffffff8133dfee
Call Trace:
 [<ffffffff8133df32>] __handle_sysrq+0x132/0x1a0
 [<ffffffff8133dfee>] write_sysrq_trigger+0x4e/0x50
 [<ffffffff811e98be>] proc_reg_write+0x7e/0xc0
 [<ffffffff81181368>] vfs_write+0xb8/0x1a0
 [<ffffffff81181c61>] sys_write+0x51/0x90
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: d0 88 81 e3 db fd 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 5d ce 75 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 
RIP  [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
 RSP <ffff88011793de18>
CR2: 0000000000000000
---[ end trace bc0537e789f08a48 ]---
Kernel panic - not syncing: Fatal exception
Pid: 3434, comm: echo Tainted: G      D    ---------------    2.6.32-358.20.1.el6.x86_64 #1
Call Trace:
 [<ffffffff8150da2a>] ? panic+0xa7/0x16f
 [<ffffffff81511c54>] ? oops_end+0xe4/0x100
 [<ffffffff81046c1b>] ? no_context+0xfb/0x260
 [<ffffffff81046ea5>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff81282796>] ? __const_udelay+0x46/0x50
 [<ffffffff81046fce>] ? bad_area+0x4e/0x60
 [<ffffffff81047780>] ? __do_page_fault+0x3d0/0x480
 [<ffffffff8106e585>] ? __call_console_drivers+0x75/0x90
 [<ffffffff8109ca9f>] ? up+0x2f/0x50
 [<ffffffff8106e5ea>] ? _call_console_drivers+0x4a/0x80
 [<ffffffff8106ecff>] ? release_console_sem+0x1cf/0x220
 [<ffffffff81513b7e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff81510f35>] ? page_fault+0x25/0x30
 [<ffffffff8133dc76>] ? sysrq_handle_crash+0x16/0x20
 [<ffffffff8133df32>] ? __handle_sysrq+0x132/0x1a0
 [<ffffffff8133dfee>] ? write_sysrq_trigger+0x4e/0x50
 [<ffffffff811e98be>] ? proc_reg_write+0x7e/0xc0
 [<ffffffff81181368>] ? vfs_write+0xb8/0x1a0
 [<ffffffff81181c61>] ? sys_write+0x51/0x90
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

Comment 4 Radim Krčmář 2013-09-09 16:25:37 UTC
I could not reproduce either:
 - does `cat /sys/kernel/kexec_crash_loaded` print "1"?
 - was anything done before the write to sysrq?


---
This could happen if 'kexec_mutex' was already taken when we crash; 'crash_kexec()' then behaves as if the image was not loaded, which is what we see above.

'kexec_mutex' is taken in these other cases:
 - reading/writing of 'kernel/kexec_crash_size' (in sysfs)
 - loading kexec image (kexec_load syscall)
 - rebooting with kexec (reboot syscall)

Comment 5 Radim Krčmář 2013-09-20 12:25:19 UTC
I think we did not have kexec kernel loaded, please reopen if you still hit it.