| Summary: | use 'echo c > /proc/sysrq-trigger' to trigger guest crash which cause guest hang and fail to reboot | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Sibiao Luo <sluo> | ||||
| Component: | qemu-kvm | Assignee: | Radim Krčmář <rkrcmar> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.5 | CC: | acathrow, bsarathy, chayang, flang, juzhang, michen, mkenneth, qzhang, virt-maint, xfu | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-09-20 12:25:19 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Sibiao Luo
2013-08-28 10:43:56 UTC
Created attachment 791294 [details]
guest kernel logs.
(In reply to Sibiao Luo from comment #0) > Description of problem: > use 'echo c > /proc/sysrq-trigger' to trigger guest crash, after guest call > trace but then cause guest hang and fail to reboot. > > Version-Release number of selected component (if applicable): > host info: > # uname -r && rpm -q qemu-kvm > 2.6.32-413.el6.x86_64 > qemu-kvm-0.12.1.2-2.398.el6.x86_64 > guest info: > kernel-2.6.32-413.el6.x86_64 > kernel-2.6.32-413.el6.x86_64 > > How reproducible: > 2/2 > > Steps to Reproduce: > 1.boot up a rhel guest. > # /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -enable-kvm -m 4096 > -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -name sluo -uuid > 43425b70-86e5-4664-bf2c-3b76699b8bec -rtc > base=localtime,clock=host,driftfix=slew -device > virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0, > addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait > -device > virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm.1,bus=virtio- > serial0.0,id=port1,nr=1 -chardev > socket,id=channel2,path=/tmp/helloworld2,server,nowait -device > virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm.2,bus=virtio- > serial0.0,id=port2,nr=2 -drive > file=/home/RHEL-Server-6.4-64.qcow2,if=none,id=drive-system-disk, > format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,serial="QEMU- > DISK1" -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-system-disk,id=system- > disk,bootindex=1 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x5 > -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev > tap,id=hostnet0,vhost=off,script=/etc/qemu-ifup -device > virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=2C:41:38:B6:40:21, > bus=pci.0,addr=0x6,bootindex=2 -k en-us -boot menu=on -qmp > tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :1 > -spice port=5931,disable-ticketing -monitor stdio > 2.login guest and check it. > # dmesg | grep -i crash > Command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS > console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD > SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=auto > rd_LVM_LV=VolGroup/LogVol_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM > Reserving 129MB of memory at 48MB for crashkernel (System RAM: 4608MB) > Kernel command line: ro root=/dev/mapper/VolGroup-LogVol_root rd_NO_LUKS > console=tty0 console=ttyS0,115200 LANG=en_US.UTF-8 rd_NO_MD > SYSFONT=latarcyrheb-sun16 rd_LVM_LV=VolGroup/LogVol_swap crashkernel=129M@0M > rd_LVM_LV=VolGroup/LogVol_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM > crash memory driver: version 1.1 # service kdump start Kdump already running [ OK ] # service kdump status Kdump is operational > 3.send 'echo c > /proc/sysrq-trigger' in guest. > add one step between step 2 and 3 that start kdump service in guest and retest it that still hit this issue that guest hang there and fail to reboot. Not hit hit the problem on rhel6.4.z guest ,"echo c > /proc/sysrq-trigger "this comand work well , but will hit the problem when i trigger a crash use " taskset -c 1 echo c > /proc/sysrq-trigger"
Version:
Host:
# uname -r
2.6.32-414.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.398.el6.x86_64
# rpm -q seabios
seabios-0.6.1.2-28.el6.x86_64
Guest:
2.6.32-358.20.1.el6.x86_64
Steps:
1.Boot a guest
/usr/libexec/qemu-kvm -name RHEL6.4.z -M rhel6.5.0 -m 4G -realtime mlock=off -smp 2,maxcpus=8 -uuid f15b9d0c-559d-47fa-a04f-08886831f4ef -nodefconfig -nodefaults -monitor stdio -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/root/rhel6.4-z-64.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi -device scsi-hd,drive=drive-virtio-0-0,id=drive-scsi,bootindex=1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:9e:3e:ea,bus=pci.0,addr=0x6 -device usb-tablet,id=input0 -spice port=8000,disable-ticketing -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -drive file=/root/RHEL6.4-20130130.0-Server-x86_64-DVD1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,id=ide0-1-0 -boot menu=on -qmp tcp:0:5555,server,nowait -serial unix:/tmp/tty0,server,nowait
2.In guest, make sure kdump is configured.
#dmesg | grep crashkernel
dmesg |grep crashkernel
Command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=VolGroup/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=tty0 console=ttyS0,115200 rhgb quiet
Reserving 128MB of memory at 48MB for crashkernel (System RAM: 4608MB)
Kernel command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=VolGroup/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=tty0 console=ttyS0,115200 rhgb quiet
3. trigger a crash :
# taskset -c 1 echo c > /proc/sysrq-trigger
Resutls:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
PGD 117973067 PUD 11a3dd067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host1/target1:0:0/1:0:0:0/evt_media_change
CPU 1
Modules linked in: nls_utf8 fuse autofs4 sunrpc 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode sg virtio_balloon virtio_net virtio_console i2c_piix4 i2c_core ext4 jbd2 mbcache sd_mod crc_t10dif virtio_scsi sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Pid: 3434, comm: echo Not tainted 2.6.32-358.20.1.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff8133dc76>] [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
RSP: 0018:ffff88011793de18 EFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff88011793de18 R08: ffffffff81c07800 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81afff20 R14: 0000000000000286 R15: 0000000000000004
FS: 00007fce2e1cb700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001179cf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process echo (pid: 3434, threadinfo ffff88011793c000, task ffff8801178dd500)
Stack:
ffff88011793de68 ffffffff8133df32 ffff8801178dd500 ffff880100000000
<d> 0000000000000022 0000000000000002 ffff88011cdb3780 00007fce2e1d9000
<d> 0000000000000002 fffffffffffffffb ffff88011793de98 ffffffff8133dfee
Call Trace:
[<ffffffff8133df32>] __handle_sysrq+0x132/0x1a0
[<ffffffff8133dfee>] write_sysrq_trigger+0x4e/0x50
[<ffffffff811e98be>] proc_reg_write+0x7e/0xc0
[<ffffffff81181368>] vfs_write+0xb8/0x1a0
[<ffffffff81181c61>] sys_write+0x51/0x90
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: d0 88 81 e3 db fd 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 5d ce 75 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47
RIP [<ffffffff8133dc76>] sysrq_handle_crash+0x16/0x20
RSP <ffff88011793de18>
CR2: 0000000000000000
---[ end trace bc0537e789f08a48 ]---
Kernel panic - not syncing: Fatal exception
Pid: 3434, comm: echo Tainted: G D --------------- 2.6.32-358.20.1.el6.x86_64 #1
Call Trace:
[<ffffffff8150da2a>] ? panic+0xa7/0x16f
[<ffffffff81511c54>] ? oops_end+0xe4/0x100
[<ffffffff81046c1b>] ? no_context+0xfb/0x260
[<ffffffff81046ea5>] ? __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff81282796>] ? __const_udelay+0x46/0x50
[<ffffffff81046fce>] ? bad_area+0x4e/0x60
[<ffffffff81047780>] ? __do_page_fault+0x3d0/0x480
[<ffffffff8106e585>] ? __call_console_drivers+0x75/0x90
[<ffffffff8109ca9f>] ? up+0x2f/0x50
[<ffffffff8106e5ea>] ? _call_console_drivers+0x4a/0x80
[<ffffffff8106ecff>] ? release_console_sem+0x1cf/0x220
[<ffffffff81513b7e>] ? do_page_fault+0x3e/0xa0
[<ffffffff81510f35>] ? page_fault+0x25/0x30
[<ffffffff8133dc76>] ? sysrq_handle_crash+0x16/0x20
[<ffffffff8133df32>] ? __handle_sysrq+0x132/0x1a0
[<ffffffff8133dfee>] ? write_sysrq_trigger+0x4e/0x50
[<ffffffff811e98be>] ? proc_reg_write+0x7e/0xc0
[<ffffffff81181368>] ? vfs_write+0xb8/0x1a0
[<ffffffff81181c61>] ? sys_write+0x51/0x90
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
I could not reproduce either: - does `cat /sys/kernel/kexec_crash_loaded` print "1"? - was anything done before the write to sysrq? --- This could happen if 'kexec_mutex' was already taken when we crash; 'crash_kexec()' then behaves as if the image was not loaded, which is what we see above. 'kexec_mutex' is taken in these other cases: - reading/writing of 'kernel/kexec_crash_size' (in sysfs) - loading kexec image (kexec_load syscall) - rebooting with kexec (reboot syscall) I think we did not have kexec kernel loaded, please reopen if you still hit it. |