Description of problem: My virtual server is running some days ago and today one of my virtual machine was not responding. I was able to log only these thing: Lots of these messages in dmesg output: vcpu not ready for apic_round_robin vcpu not ready for apic_round_robin vcpu not ready for apic_round_robin My virtual machine (qemu-kvm process) was eating 100% CPU. Second virtual machine works without problems. This problematic virtual machine stopped to respond to any data, no ping, unable to communicate using serial console, ... After some searches on internet this message was often recalled with cpu frequency scalling, but I have no cpuspeed or cpufreq installed on host or guest. Version-Release number of selected component (if applicable): host is an Fedora 11 kernel-2.6.29.5-191.fc11.x86_64 qemu-kvm-0.10.5-3.fc11.x86_64 guest is an Fedora 10 How reproducible: I can't reproduce this.
This problem looks to be a problem of virtual machine, which creates an oops and then kernel panic. This happens too often, aprox. once per day. Here is part of messages from virtual serial console (full log attached): BUG: unable to handle kernel paging request at fff82000 IP: [<c048a9f8>] __bounce_end_io_read+0x88/0xf8 Oops: 0002 [#1] SMP Modules linked in: ipv6 nf_conntrack_netbios_ns virtio_balloon floppy virtio_net pcspkr joydev i2c_piix4 i2c_core virtio_pci virtio_ring virtio_blk virtio [last unloaded: scsi_wait_scan] Pid: 27956, comm: httpd Not tainted (2.6.27.25-170.2.72.fc10.i686.PAE #1) EIP: 0060:[<c048a9f8>] EFLAGS: 00210086 CPU: 2 EIP is at __bounce_end_io_read+0x88/0xf8 EAX: fff82000 EBX: e936ae00 ECX: 00000400 EDX: 00001000 ESI: ea808000 EDI: fff82000 EBP: c08c5f00 ESP: c08c5edc DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process httpd (pid: 27956, ti=c08c5000 task=e9940000 task.ti=e9571000) Now I am trying latest fedora-testing kernel, if this still fails, I will try to switch to another disk backend (not virtio-blk). Looks to be like this bug: http://bugzilla.kernel.org/show_bug.cgi?id=12405.
Created attachment 351427 [details] Kernel oops
Created attachment 351552 [details] 2nd server with kernel 2.6.29.5-84.fc10.i686.PAE Same problem with 2.6.29.5-84.fc10.i686.PAE. Kernel oops attached.
Here is an possible solution: http://marc.info/?l=kvm&m=124757839015712&w=2 I think Cristoph says about this patch: http://kerneltrap.org/mailarchive/linux-kvm/2009/6/20/6063133 There are no: blk_queue_max_phys_segments(vblk->disk->queue, vblk->sg_elems-2); blk_queue_max_hw_segments(vblk->disk->queue, vblk->sg_elems-2); or blk_queue_max_sectors(vblk->disk->queue, -1U); in 2.6.27 and I am not sure, if we should add these too. Anybody experienced here?
Similar problem with only 4 IDE and 1 SCSCI virtual drives, just there is no info on serial console. Only server does not respond to ping, it's dead and I can't only destroy it. Current KVM command line: /usr/bin/qemu-kvm -S -M pc -m 4096 -smp 4 -name www -uuid f3c8e927-cda6-af7a-5ba5-388e5871c601 -monitor pty -pidfile /var/run/libvirt/qemu//www.pid -boot c -drive file=/dev/vg1/www_root,if=ide,index=0,boot=on -drive file=/dev/vg1/www_swap,if=ide,index=1 -drive file=/dev/vg1/www_home,if=ide,index=2 -drive file=/dev/vg1/www_log,if=ide,index=3 -drive file=/dev/vg1/www_git,if=scsi,index=4 -net nic,macaddr=00:16:3e:23:eb:23,vlan=0,model=virtio -net tap,fd=18,vlan=0 -net nic,macaddr=00:16:3e:07:c3:fe,vlan=1,model=virtio -net tap,fd=20,vlan=1 -serial pty -parallel none -usb -usbdevice tablet -vnc 127.0.0.1:1 Very curious, that this happen mostly at midnight, mostly between 23:55 - 00:05 of my local time. There is no special job at this time locally, may be some job was run over network.
This last message with IDE drives has been cause by an disabled swap space on host, but after it was enabled, I have still problems with virtio driver. Now trying an patched 2.6.29 kernel.
Created attachment 354832 [details] This should be applied to kernel-2.6.29.5-84, or may be newer too. This patch should be applied to kernel-2.6.29.5-84, or may be newer too. My system with this patch applied is now running more than 4 days ago without problems, so looks that this patch works.
Created attachment 354834 [details] This should be applied to kernel-2.6.29.5-84, or may be newer too. My system with this patch applied is now running more than 4 days ago without problems, so looks that this patch works.
Committed upstream as of a few days ago: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4eff3cae9c9809720c636e64bc72f212258e0bd5 Tacked onto the end of our F11 and F10 2.6.29.x kernel builds, and Chuck is working on adding it to an F10 2.6.27.x kernel build.
Chuck, can I ask you to build this fc10 kernel from CVS? Thank you.
Fix went in F-11 kernel-2.6.29.6-216 and F-10 kernel-2.6.27.28-170.2.74
2.5 days uptime on my machine, looks that this bug has been fixed well with: [root@mail ~]# uname -a Linux mail.inver.sk 2.6.27.29-170.2.78.fc10.i686.PAE #1 SMP Fri Jul 31 04:28:25 EDT 2009 i686 i686 i386 GNU/Linux [root@mail ~]# uptime 08:48:20 up 2 days, 13:40, 1 user, load average: 0.89, 0.37, 0.18 Tested on 2 machines.
kernel-2.6.27.29-170.2.78.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/kernel-2.6.27.29-170.2.78.fc10
kernel-2.6.27.29-170.2.78.fc10 has been pushed to the Fedora 10 stable repository. If problems still persist, please make note of it in this bug report.