Bug 1002015 - use 'echo c > /proc/sysrq-trigger' to trigger guest crash which cause guest hang and fail to reboot
use 'echo c > /proc/sysrq-trigger' to trigger guest crash which cause guest h...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.5
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Radim Krčmář
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-28 06:43 EDT by Sibiao Luo
Modified: 2013-09-20 08:25 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-20 08:25:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
guest kernel logs. (19.55 KB, text/plain)
2013-08-28 06:45 EDT, Sibiao Luo
no flags Details

  None (edit)
Description Sibiao Luo 2013-08-28 06:43:56 EDT
Description of problem:
use 'echo c > /proc/sysrq-trigger' to trigger guest crash, after guest call trace but then cause guest hang and fail to reboot.

Version-Release number of selected component (if applicable):
host info:
# uname -r && rpm -q qemu-kvm
2.6.32-413.el6.x86_64
qemu-kvm-0.12.1.2-2.398.el6.x86_64
guest info:
kernel-2.6.32-413.el6.x86_64
kernel-2.6.32-413.el6.x86_64

How reproducible:
2/2

Steps to Reproduce:
1.boot up a rhel guest.
# /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -name sluo -uuid 43425b70-86e5-4664-bf2c-3b76699b8bec -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm.1,bus=virtio-serial0.0,id=port1,nr=1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm.2,bus=virtio-serial0.0,id=port2,nr=2 -drive file=/home/RHEL-Server-6.4-64.qcow2,if=none,id=drive-system-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,serial="QEMU-DISK1" -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-system-disk,id=system-disk,bootindex=1 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x5 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev tap,id=hostnet0,vhost=off,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=2C:41:38:B6:40:21,bus=pci.0,addr=0x6,bootindex=2 -k en-us -boot menu=on -qmp tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :1 -spice port=5931,disable-ticketing -monitor stdio
2.login guest and check it.
# dmesg | grep -i crash
Command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=auto rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
Reserving 129MB of memory at 48MB for crashkernel (System RAM: 4608MB)
Kernel command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=129M@0M rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
crash memory driver: version 1.1

3.send 'echo c > /proc/sysrq-trigger' in guest.

Actual results:
after step 3, guest hang there and fail to reboot. I will attach the guest kernel log later.

Expected results:
it should call trace and generate vmcorem and reboot guest successfully.

Additional info:
Comment 1 Sibiao Luo 2013-08-28 06:45:29 EDT
Created attachment 791294 [details]
guest kernel logs.
Comment 2 Sibiao Luo 2013-08-28 07:16:52 EDT
(In reply to Sibiao Luo from comment #0)
> Description of problem:
> use 'echo c > /proc/sysrq-trigger' to trigger guest crash, after guest call
> trace but then cause guest hang and fail to reboot.
> 
> Version-Release number of selected component (if applicable):
> host info:
> # uname -r && rpm -q qemu-kvm
> 2.6.32-413.el6.x86_64
> qemu-kvm-0.12.1.2-2.398.el6.x86_64
> guest info:
> kernel-2.6.32-413.el6.x86_64
> kernel-2.6.32-413.el6.x86_64
> 
> How reproducible:
> 2/2
> 
> Steps to Reproduce:
> 1.boot up a rhel guest.
> # /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -enable-kvm -m 4096
> -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -name sluo -uuid
> 43425b70-86e5-4664-bf2c-3b76699b8bec -rtc
> base=localtime,clock=host,driftfix=slew -device
> virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,
> addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait
> -device
> virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm.1,bus=virtio-
> serial0.0,id=port1,nr=1 -chardev
> socket,id=channel2,path=/tmp/helloworld2,server,nowait -device
> virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm.2,bus=virtio-
> serial0.0,id=port2,nr=2 -drive
> file=/home/RHEL-Server-6.4-64.qcow2,if=none,id=drive-system-disk,
> format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,serial="QEMU-
> DISK1" -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-system-disk,id=system-
> disk,bootindex=1 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x5
> -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev
> tap,id=hostnet0,vhost=off,script=/etc/qemu-ifup -device
> virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=2C:41:38:B6:40:21,
> bus=pci.0,addr=0x6,bootindex=2 -k en-us -boot menu=on -qmp
> tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :1
> -spice port=5931,disable-ticketing -monitor stdio
> 2.login guest and check it.
> # dmesg | grep -i crash
> Command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS
> console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD
> SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=auto
> rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
> Reserving 129MB of memory at 48MB for crashkernel (System RAM: 4608MB)
> Kernel command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS
> console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD
> SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=129M@0M
> rd_LVM_LV=VolGroup/LogVol_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
> crash memory driver: version 1.1
# service kdump start
Kdump already running                                      [  OK  ]
# service kdump status
Kdump is operational
> 3.send 'echo c > /proc/sysrq-trigger' in guest.
> 
add one step between step 2 and 3 that start kdump service in guest and retest it that still hit this issue that guest hang there and fail to reboot.
Comment 3 langfang 2013-08-28 20:11:56 EDT
Not hit hit the problem on rhel6.4.z guest ,"echo c > /proc/sysrq-trigger "this comand work well , but will hit the problem when i trigger a crash use " taskset -c 1 echo c > /proc/sysrq-trigger"

Version:
Host:
# uname -r
2.6.32-414.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.398.el6.x86_64
# rpm -q seabios
seabios-0.6.1.2-28.el6.x86_64

Guest:
 2.6.32-358.20.1.el6.x86_64 

Steps:
1.Boot a guest
 /usr/libexec/qemu-kvm -name RHEL6.4.z -M rhel6.5.0 -m 4G -realtime mlock=off -smp 2,maxcpus=8 -uuid f15b9d0c-559d-47fa-a04f-08886831f4ef -nodefconfig -nodefaults -monitor stdio -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/root/rhel6.4-z-64.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=drive-virtio-0-0,id=drive-scsi,bootindex=1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:9e:3e:ea,bus=pci.0,addr=0x6 -device usb-tablet,id=input0 -spice port=8000,disable-ticketing -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -drive file=/root/RHEL6.4-20130130.0-Server-x86_64-DVD1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,id=ide0-1-0 -boot menu=on -qmp tcp:0:5555,server,nowait -serial unix:/tmp/tty0,server,nowait

2.In guest, make sure kdump is configured.

    #dmesg  | grep crashkernel
dmesg |grep crashkernel
Command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=VolGroup/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=tty0 console=ttyS0,115200 rhgb quiet
Reserving 128MB of memory at 48MB for crashkernel (System RAM: 4608MB)
Kernel command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=VolGroup/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=tty0 console=ttyS0,115200 rhgb quiet

3. trigger a crash :

   # taskset -c 1 echo c > /proc/sysrq-trigger

Resutls:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
PGD 117973067 PUD 11a3dd067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host1/target1:0:0/1:0:0:0/evt_media_change
CPU 1 
Modules linked in: nls_utf8 fuse autofs4 sunrpc 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode sg virtio_balloon virtio_net virtio_console i2c_piix4 i2c_core ext4 jbd2 mbcache sd_mod crc_t10dif virtio_scsi sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 3434, comm: echo Not tainted 2.6.32-358.20.1.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff8133dc76>]  [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
RSP: 0018:ffff88011793de18  EFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff88011793de18 R08: ffffffff81c07800 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81afff20 R14: 0000000000000286 R15: 0000000000000004
FS:  00007fce2e1cb700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001179cf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process echo (pid: 3434, threadinfo ffff88011793c000, task ffff8801178dd500)
Stack:
 ffff88011793de68 ffffffff8133df32 ffff8801178dd500 ffff880100000000
<d> 0000000000000022 0000000000000002 ffff88011cdb3780 00007fce2e1d9000
<d> 0000000000000002 fffffffffffffffb ffff88011793de98 ffffffff8133dfee
Call Trace:
 [<ffffffff8133df32>] __handle_sysrq+0x132/0x1a0
 [<ffffffff8133dfee>] write_sysrq_trigger+0x4e/0x50
 [<ffffffff811e98be>] proc_reg_write+0x7e/0xc0
 [<ffffffff81181368>] vfs_write+0xb8/0x1a0
 [<ffffffff81181c61>] sys_write+0x51/0x90
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: d0 88 81 e3 db fd 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 5d ce 75 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 
RIP  [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
 RSP <ffff88011793de18>
CR2: 0000000000000000
---[ end trace bc0537e789f08a48 ]---
Kernel panic - not syncing: Fatal exception
Pid: 3434, comm: echo Tainted: G      D    ---------------    2.6.32-358.20.1.el6.x86_64 #1
Call Trace:
 [<ffffffff8150da2a>] ? panic+0xa7/0x16f
 [<ffffffff81511c54>] ? oops_end+0xe4/0x100
 [<ffffffff81046c1b>] ? no_context+0xfb/0x260
 [<ffffffff81046ea5>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff81282796>] ? __const_udelay+0x46/0x50
 [<ffffffff81046fce>] ? bad_area+0x4e/0x60
 [<ffffffff81047780>] ? __do_page_fault+0x3d0/0x480
 [<ffffffff8106e585>] ? __call_console_drivers+0x75/0x90
 [<ffffffff8109ca9f>] ? up+0x2f/0x50
 [<ffffffff8106e5ea>] ? _call_console_drivers+0x4a/0x80
 [<ffffffff8106ecff>] ? release_console_sem+0x1cf/0x220
 [<ffffffff81513b7e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff81510f35>] ? page_fault+0x25/0x30
 [<ffffffff8133dc76>] ? sysrq_handle_crash+0x16/0x20
 [<ffffffff8133df32>] ? __handle_sysrq+0x132/0x1a0
 [<ffffffff8133dfee>] ? write_sysrq_trigger+0x4e/0x50
 [<ffffffff811e98be>] ? proc_reg_write+0x7e/0xc0
 [<ffffffff81181368>] ? vfs_write+0xb8/0x1a0
 [<ffffffff81181c61>] ? sys_write+0x51/0x90
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Comment 4 Radim Krčmář 2013-09-09 12:25:37 EDT
I could not reproduce either:
 - does `cat /sys/kernel/kexec_crash_loaded` print "1"?
 - was anything done before the write to sysrq?


---
This could happen if 'kexec_mutex' was already taken when we crash; 'crash_kexec()' then behaves as if the image was not loaded, which is what we see above.

'kexec_mutex' is taken in these other cases:
 - reading/writing of 'kernel/kexec_crash_size' (in sysfs)
 - loading kexec image (kexec_load syscall)
 - rebooting with kexec (reboot syscall)
Comment 5 Radim Krčmář 2013-09-20 08:25:19 EDT
I think we did not have kexec kernel loaded, please reopen if you still hit it.

Note You need to log in before you can comment on or make changes to this bug.