Hide Forgot
Description of problem: Running a kvm guest and the image is in an iscsi disk, then using kexec to start a new kernel in host, host got call trace sometimes. Test with a kvm guest that is installed on local disk, don't meet this issue after about 7 times attempt. Version-Release number of selected component (if applicable): kernel-2.6.18-301.el5 kernel-2.6.18-302.el5 kvm-83-246.el5 How reproducible: Sometimes Steps to Reproduce: 1.Make sure there are two version of kernel installed on host . # rpm -qa | grep kernel kernel-2.6.18-301.el5 kernel-2.6.18-302.el5 2. Start kvm process on host. (A kvm guest installed on a iscsi disk) /usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate now -name 2k8r2-64 -smp 2,cores=2 -k en-us -m 2048 -boot c -net nic,vlan=1,macaddr=00:1a:4a:40:1a:16,model=virtio -net tap,vlan=1,ifname=virtio_10_1,script=/etc/qemu-ifup,downscript=no -drive file=/dev/vgtest-qzhang/lvtest-2k8r2,media=disk,if=ide,cache=off,format=qcow2,werror=stop -cpu qemu64,+sse2 -M rhel5.6.0 -uuid 3d292e57-5fb4-42f8-8b96-b7d84c016e96 -notify all -balloon none -monitor stdio -vnc :10 2. On host, get current commmad line . #cat /proc/cmdline ro root=/dev/VolGroup00/LogVol00 crashkernel=128M@16M console=tty0 console=ttyS0,115200nb 3. load a new kernel #uname -r 2.6.18-302.el5 # kexec -l /boot/vmlinuz-2.6.18-301.el5 --append="ro root=/dev/VolGroup00/LogVol00 crashkernel=128M@16M console=tty0 console=ttyS0,115200nb" --initrd=/boot/initrd-2.6.18-301.el5.img 4. Run the currently loaded kernel. # kexec -e 5. If host starts the new kernel (kernel-301) successfully, then repeat step 1-4 to let the host boot into kernel-302. 6. Repeat step 1-5. Actual results: Host got call trace, I meet this issue at the 2nd times kexec test. Expected results: Host works well, can boot into new kernel successfully. Additional info: Call trace: localhost.localdomain login: Synchronizing SCSI cache for disk sdi: connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299488421, last ping 4299493421, now 4299498421 connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299488422, last ping 4299493422, now 4299498432 connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299488422, last ping 4299493422, now 4299498442 connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299488423, last ping 4299493423, now 4299498453 connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299488424, last ping 4299493424, now 4299498463 connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299488425, last ping 4299493425, now 4299498474 connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299488426, last ping 4299493426, now 4299498484 connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299490185, last ping 4299495185, now 4299500185 FAILED status = 0, message = 00, host = 15, driver = 00 <5>Synchronizing SCSI cache for disk sdh: FAILED status = 0, message = 00, host = 15, driver = 00 <5>Synchronizing SCSI cache for disk sdg: FAILED status = 0, message = 00, host = 15, driver = 00 <5>Synchronizing SCSI cache for disk sdf: FAILED status = 0, message = 00, host = 15, driver = 00 <5>Synchronizing SCSI cache for disk sde: FAILED status = 0, message = 00, host = 15, driver = 00 <5>Synchronizing SCSI cache for disk sdd: FAILED status = 0, message = 00, host = 15, driver = 00 <5>Synchronizing SCSI cache for disk sdc: FAILED status = 0, message = 00, host = 15, driver = 00 <5>Synchronizing SCSI cache for disk sdb: FAILED status = 0, message = 00, host = 15, driver = 00 <5>Synchronizing SCSI cache for disk sda: Starting new kernel INFO: task events/2:16 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. events/2 D 0000000000000000 0 16 1 17 15 (L-TLB) ffff810127ff3d30 0000000000000046 0000000300000000 0000000000000000 0000000000000000 000000000000000a ffff810127fce7e0 ffff810127c81820 000007fd05e545fa 00000000000007b9 ffff810127fce9c8 0000000227ff3e10 Call Trace: [<ffffffff80240a78>] linkwatch_event+0x0/0x30 [<ffffffff80063171>] wait_for_completion+0x79/0xa2 [<ffffffff8008ee54>] default_wake_function+0x0/0xe [<ffffffff800a178e>] synchronize_rcu+0x30/0x36 [<ffffffff800a12ca>] wakeme_after_rcu+0x0/0x9 [<ffffffff80248585>] dev_deactivate+0x82/0xb1 [<ffffffff80240a3a>] __linkwatch_run_queue+0x1a1/0x1df [<ffffffff80240aa2>] linkwatch_event+0x2a/0x30 [<ffffffff8004d293>] run_workqueue+0x9e/0xfb [<ffffffff80049a9c>] worker_thread+0x0/0x122 [<ffffffff80049b8c>] worker_thread+0xf0/0x122 [<ffffffff8008ee54>] default_wake_function+0x0/0xe [<ffffffff8003265f>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff80032561>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 INFO: task events/2:16 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. events/2 D 0000000000000000 0 16 1 17 15 (L-TLB) ffff810127ff3d30 0000000000000046 0000000300000000 0000000000000000 0000000000000000 000000000000000a ffff810127fce7e0 ffff810127c81820 000007fd05e545fa 00000000000007b9 ffff810127fce9c8 0000000227ff3e10 Call Trace: [<ffffffff80240a78>] linkwatch_event+0x0/0x30 [<ffffffff80063171>] wait_for_completion+0x79/0xa2 [<ffffffff8008ee54>] default_wake_function+0x0/0xe [<ffffffff800a178e>] synchronize_rcu+0x30/0x36 [<ffffffff800a12ca>] wakeme_after_rcu+0x0/0x9 [<ffffffff80248585>] dev_deactivate+0x82/0xb1 [<ffffffff80240a3a>] __linkwatch_run_queue+0x1a1/0x1df [<ffffffff80240aa2>] linkwatch_event+0x2a/0x30 [<ffffffff8004d293>] run_workqueue+0x9e/0xfb [<ffffffff80049a9c>] worker_thread+0x0/0x122 [<ffffffff80049b8c>] worker_thread+0xf0/0x122 [<ffffffff8008ee54>] default_wake_function+0x0/0xe [<ffffffff8003265f>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff80032561>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11
Please test kdump not kexec directly by using a modified kdump.conf with a dump target. We don't support kexec direct reboot.
If you tried the above and the problem occurred, please re-open it.
(In reply to comment #1) > Please test kdump not kexec directly by using a modified kdump.conf with a dump > target. We don't support kexec direct reboot. We do not support it only on virt or on real HW too? What is the reason we do not support it? It is very useful to quickly reboot machines with very slow BIOS.
Let me re-clarify. kexec() syscall is supported but as I said it is hard to get it right manually on different machines due to complicated arguments required, while kdump will take care of those arguments right automatically. The use case you described is not really what customers/partners will use AFAICT, so it is more of a convenience debugging feature for kernel hackers similar to kgdb, kernel-debug variants which is just a lower-priority for engineering resource-wise. But again, if kdump works not kexec directly, then it is likely that it is because of the arguments were not get right.