Description of problem: Setup kdump in guest and trigger a crash through /proc/sysrq-trigger interface will cause guest hang and the qemu process take up nearly 100% cpu. `strace' output of qemu process: ... select(19, [4 6 8 11 12 14 16 18], [], [], {0, 999000}) = 1 (in [16], left {0, 995000}) read(16, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 rt_sigaction(SIGALRM, NULL, {0x4079b0, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x3a4960e4c0}, 8) = 0 write(5, "\0", 1) = 1 read(16, 0x7fff4e7abe50, 128) = -1 EAGAIN (Resource temporarily unavailable) clock_gettime(CLOCK_MONOTONIC, {1129889, 697487611}) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 697542611}) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 697597611}) = 0 select(19, [4 6 8 11 12 14 16 18], [], [], {1, 0}) = 1 (in [4], left {1, 0}) read(4, "\0", 512) = 1 read(4, 0x7fff4e7abcd0, 512) = -1 EAGAIN (Resource temporarily unavailable) clock_gettime(CLOCK_MONOTONIC, {1129889, 697856611}) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 697935611}) = 0 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 3686000}}, NULL) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 698134611}) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 698207611}) = 0 select(19, [4 6 8 11 12 14 16 18], [], [], {1, 0}) = 1 (in [16], left {0, 996000}) read(16, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 rt_sigaction(SIGALRM, NULL, {0x4079b0, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x3a4960e4c0}, 8) = 0 write(5, "\0", 1) = 1 read(16, 0x7fff4e7abe50, 128) = -1 EAGAIN (Resource temporarily unavailable) clock_gettime(CLOCK_MONOTONIC, {1129889, 702510611}) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 702569611}) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 702623611}) = 0 select(19, [4 6 8 11 12 14 16 18], [], [], {1, 0}) = 1 (in [4], left {1, 0}) read(4, "\0", 512) = 1 read(4, 0x7fff4e7abcd0, 512) = -1 EAGAIN (Resource temporarily unavailable) clock_gettime(CLOCK_MONOTONIC, {1129889, 702883611}) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 702956611}) = 0 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 3686000}}, NULL) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 703135611}) = 0 clock_gettime(CLOCK_MONOTONIC, {1129889, 703189611}) = 0 ... Does this related to timer IRQs since found following message during guest booting up: WARNING calibrate_APIC_clock: the APIC timer calibration may be wrong. Some information from guest: #cat /proc/cmdline ro root=LABEL=/ rhgb quiet crashkernel=128M@16M 3 console=tty0 console=ttyS0,115200 # dmesg |grep -i memory Memory: 1925200k/2097088k available (2575k kernel code, 171436k reserved, 1298k data, 212k init) Freeing initrd memory: 2565k freed Total HugeTLB memory allocated, 0 Non-volatile memory driver v1.2 Freeing unused kernel memory: 212k freed [root@localhost ~]# free -m total used free shared buffers cached Mem: 1882 252 1630 0 14 167 -/+ buffers/cache: 70 1812 Swap: 2047 0 2047 [root@localhost ~]# NOTE: we can see from output of `free -m' that the 128M memory has been reserved for capture kernel. Version-Release number of selected component (if applicable): kvm-83-147.el5 kmod-kvm-83-147.el5 kvm-tools-83-147.el5 etherboot-zroms-kvm-5.4.4-13.el5 kvm-qemu-img-83-147.el5 kvm-debuginfo-83-147.el5 Guest kernel: 2.6.18-185.el5 How reproducible: Always Steps to Reproduce: 1. Booting the guest: #/root/devel/features/sr-iov/client/tests/kvm/qemu -name vm1 -monitor tcp:0:6001,server,nowait -drive file=/root/devel/features/sr-iov/client/tests/kvm/images/RHEL-Server-5.4-64.qcow2,if=ide,boot=on -net nic,vlan=0,model=e1000,macaddr=00:AE:70:2A:9D:00 -net tap,vlan=0,ifname=e1000_0_6001,script=/root/devel/features/sr-iov/client/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 2048 -smp 1 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -vnc :0 2. setup kdump in guest and reboot 3. trigger a crash through /proc/sysrq-trigger: echo c > /proc/sysrq-trigger Actual results: Guest hang before actually dump and qemu process takes up nearly 100% cpu of host Expected results: The system should boot into the capture kernel. Additional info:
based on the information in this bug report, it isn't clear if a 32-bit or 64-bit guest kernel was tried. It is clear, however, this is a kexec bug and NOT a timer IRQ related issue. I'll attempt to reproduce on 32/64 regardless.
Tested 32-bit: appeared to have booted into crash kernel. I've never actually done this before so I'm not 100% sure what so expect, but the system rebooted again right after that. Unfortunately, can't log in because of an unrelated disk space issue, but the system did not get stuck.
worth noting, my guest kernel is 2.6.18-164.el5, so it is possible this is some kind of regression.
Tested 64-bit: crash kernel is working just fine. I'll try with RHEL-5.5 beta guest kernels to rule out a regression, but it looks like this bug might have already been squashed - could have been a reboot issue or something fixed in KVM since 83-147.
32-bit crash kernel works fine after upgrade to RHEL-5-5 beta (2.6.18-194.el5) So does 64-bit. So it's not a kernel regression, nor is the bug found in recent KVM. Closing as unable to reproduce.