Created attachment 337100 [details] dmesg from virtual machine Description of problem: Virtual machine gets stuck after migration. One CPU is 100% load, other one idles. The virtual machine could be pinged, ssh-login is not always possible. If stopped on migration target and continued on migration-source the virtual machine recovers after some time without load. Version-Release number of selected component (if applicable): kvm-84-1.el5.x86_64.rpm kvm-kmod-84-1.el5.x86_64.rpm qemu-0.9.1-11.el5.x86_64.rpm qemu-img-0.9.1-11.el5.x86_64.rpm kvm and kvm-kmod compiled from sourceforge sources and installed from built rpm, qemu and qemu-img are from EPEL-repository How reproducible: Start a kvm-virtual-machine on host A, start a kvm-virtual-machine with the same parameters on host B in incoming-Mode. Migrate the virtual-machine from host A to host B. Watch the kvm-process go to 100% after migration finishs. Maybe you have to wait up to 5 seconds or apply a ls or such in the virtual machine. After stopping virtual machine on host B and continuing virtual machine on host A the virtual machine recovers Steps to Reproduce: 1. start kvm-virtual-machine on host A kvm -hda /dev/disk/by-path/ip-192.168.1.1:3260-iscsi-rr010:01-lun-5 -smp 2 -m 1024 -boot c -net nic,macaddr=00:16:3e:69:93:f5,model=rtl8139 -net tap,ifname=vnet0 -k en-us -monitor unix:/etc/kvm/rr019v2/run/monitor,server,nowait -pidfile /etc/kvm/rr019v2/run/pid -vnc 127.0.0.1:0 2. start kvm-virtual-machine in incoming mode on host B kvm -hda /dev/disk/by-path/ip-192.168.1.1:3260-iscsi-rr010:01-lun-5 -smp 2 -m 1024 -boot c -net nic,macaddr=00:16:3e:69:93:f5,model=rtl8139 -net tap,ifname=vnet0 -k de -monitor unix:/etc/kvm/rr019v2/run/monitor,server,nowait -pidfile /etc/kvm/rr019v2/run/pid -S -incoming tcp:192.168.1.102:4444 -vnc 127.0.0.1:0 3. wait until the kvm-virtual-machine shows up, e.g. you could login via ssh 4. migrate the kvm-virtual-machine from host A to host B nc -U /etc/kvm/rr019v2/run/monitor (qemu) migrate tcp:192.168.1.102:4444 (qemu) info migrate info migrate Migration status: completed > on host B: rr016# top -d 1 -p 30193 30193 root 15 0 1161m 1.0g 1916 S 96.0 27.2 10:49.70 kvm > in kvm-virtual-machine rr019v2# dmesg (... lots of ...) BUG: soft lockup - CPU#0 stuck for 10s! [bash:1747] CPU 0: Modules linked in: ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp floppy i2c_piix4 8139too virtio_pci parport_pc i2c_core ide_cd 8139cp virtio_ring parport serio_raw mii virtio cdrom pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 1747, comm: bash Not tainted 2.6.18-128.el5 #1 RIP: 0010:[<ffffffff80022a1c>] [<ffffffff80022a1c>] flush_tlb_others+0x8c/0xbc RSP: 0018:ffff81003c68dc38 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff810001575200 RCX: ffff810001575208 RDX: 0000000000000018 RSI: 00000000000000ff RDI: ffff810001575200 RBP: 00000000b71ebafe R08: 0000000000000003 R09: 000000000000003e R10: ffff81003c68dbd8 R11: 00000000b71ebafe R12: ffff810001575200 R13: 0000000000000000 R14: ffffffff8000c30d R15: ffff81003c68dca8 FS: 0000000000000000(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000006b2578 CR3: 000000003c68a000 CR4: 00000000000006e0 Call Trace: [<ffffffff80022a29>] flush_tlb_others+0x99/0xbc [<ffffffff80075c88>] flush_tlb_mm+0xca/0xd5 [<ffffffff80039ae2>] exit_mmap+0xad/0xf3 [<ffffffff8003bc07>] mmput+0x30/0x83 [<ffffffff8002be3a>] flush_old_exec+0x7b4/0xb08 [<ffffffff8000b464>] vfs_read+0x13c/0x171 [<ffffffff80018211>] load_elf_binary+0x478/0x181a [<ffffffff800e12a8>] get_arg_page+0x3c/0x95 [<ffffffff800178ee>] copy_strings+0x1ef/0x200 [<ffffffff8003f2a5>] search_binary_handler+0xbb/0x26d [<ffffffff8003e83a>] do_execve+0x16a/0x1f7 [<ffffffff8005492d>] sys_execve+0x36/0x4c [<ffffffff8005d4d3>] stub_execve+0x67/0xb0 5. on host B stop the virtual machine rr016# nc -U /etc/kvm/rr019v2/run/monitor (qemu) quit 6. on host A continue the virtual machine rr017# nc -U /etc/kvm/rr019v2/run/monitor (qemu) c 7. on host A check the process rr017# top -d 1 -p 4573 4573 root 15 0 1161m 1.0g 934m R 0.0 35.3 1:19.96 kvm 8. log into the vm via ssh and fetch the dmesg and messages Actual results: kvm-virtual-machine gets stuck, logged-in ssh-session is slow to response or got stock too. Ping is possible. New ssh-console is sometimes possible, sometimes not. After stopping the virtual machine on the target host and recontinueing the virtual machine on the source host everything is fine. Expected results: kvm-virtual-machine migrated flawless. Interaction is possible Additional info: This applies to all of my tested hosts amongst them. rr016# grep "model name" /proc/cpuinfo model name : Intel(R) Core(TM)2 Duo CPU T8300 @ 2.40GHz model#name : Intel(R) Core(TM)2 Duo CPU T8300 @ 2.40GHz rr016: grep "MemTotal" /proc/meminfo MemTotal: 3977960 kB rr016# brctl show bridge name bridge id STP enabled interfaces sw0 8000.002186521ea8 no vnet0 eth0 rr017# grep "model name" /proc/cpuinfo model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz rr017# grep "MemTotal" /proc/meminfo MemTotal: 3062956 kB rr017# brctl show bridge name bridge id STP enabled interfaces sw0 8000.001e3728a1c2 no vnet0 eth0 rr019# grep "model name" /proc/cpuinfo model name : Quad-Core AMD Opteron(tm) Processor 2344 HE model name : Quad-Core AMD Opteron(tm) Processor 2344 HE model name : Quad-Core AMD Opteron(tm) Processor 2344 HE model name : Quad-Core AMD Opteron(tm) Processor 2344 HE model name : Quad-Core AMD Opteron(tm) Processor 2344 HE model name : Quad-Core AMD Opteron(tm) Processor 2344 HE model name : Quad-Core AMD Opteron(tm) Processor 2344 HE model name : Quad-Core AMD Opteron(tm) Processor 2344 HE rr019# grep "MemTotal" /proc/meminfo MemTotal: 4047848 kB rr019# brctl show bridge name bridge id STP enabled interfaces sw0 8000.00e08176899e no vnet0 eth0
Created attachment 337101 [details] /var/log/messages of virtual machine
From the virtual machine rr019v2# grep "model name" /proc/cpuinfo model name : QEMU Virtual CPU version 0.9.1 model name : QEMU Virtual CPU version 0.9.1 rr019v2# grep "MemTotal" /proc/meminfo MemTotal: 1026536 kB rr019v2# free total used free shared buffers cached Mem: 1026536 87380 939156 0 27056 32352 -/+ buffers/cache: 27972 998564 Swap: 524280 0 524280 all operations performed as root on hosts.
rr016# uname -a Linux rr016 2.6.18-128.1.1.el5 #1 SMP Mon Jan 26 13:58:24 EST 2009 x86_64 x86_64 x86_64 GNU/Linux rr017# uname -a Linux rr017 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux rr019# uname -a Linux rr019 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008 x86_64 x86_64 x86_64 GNU/Linux selinux disabled on all hosts and on the virtual machine, iptables shut off too to avoid side-effects.
tried also on rr016 with 2.6.18-128.el5 kernel but to no avail. rr016# uname -a Linux rr016 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux # modinfo kvm filename: /lib/modules/2.6.18-128.el5/extra/kvm.ko license: GPL author: Qumranet version: kvm-84 srcversion: D964574B5665D21B64CD65A depends: vermagic: 2.6.18-128.el5 SMP mod_unload gcc-4.1 parm: oos_shadow:bool parm: msi2intx:bool # modinfo kvm_intel filename: /lib/modules/2.6.18-128.el5/extra/kvm-intel.ko license: GPL author: Qumranet version: kvm-84 srcversion: 4829C8B5FA311860FEA4B9A depends: kvm vermagic: 2.6.18-128.el5 SMP mod_unload gcc-4.1 parm: bypass_guest_pf:bool parm: enable_vpid:bool parm: flexpriority_enabled:bool parm: enable_ept:bool parm: emulate_invalid_guest_state:bool
Problem exists with -smp 1 for the virtual machine too. [root@rr019v2 ~]# dmesg BUG: soft lockup - CPU#0 stuck for 10s! [hald-addon-stor:1568] CPU 0: Modules linked in: ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp floppy i2c_piix4 ide_cd pcspkr 8139too cdrom i2c_core 8139cp parport_pc mii virtio_pci parport serio_raw virtio_ring virtio dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 1568, comm: hald-addon-stor Not tainted 2.6.18-128.el5 #1 RIP: 0010:[<ffffffff8000ec28>] [<ffffffff8000ec28>] ide_do_request+0x30f/0x78d RSP: 0018:ffffffff80425d78 EFLAGS: 00000246 RAX: 0000000000204108 RBX: ffff81003fd18480 RCX: ffff81003fd18480 RDX: ffff810000000000 RSI: ffff81003fd18480 RDI: 000000000000000f RBP: ffffffff80425cf0 R08: 000000003ff98000 R09: 0000000000000000 R10: ffff81003fd18480 R11: 0000000000000110 R12: ffffffff8005dc8e R13: ffffffff804cb918 R14: ffffffff800774da R15: ffffffff80425cf0 FS: 00002b7ca95d36e0(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aded67c7000 CR3: 000000003aed4000 CR4: 00000000000006e0 Call Trace: <IRQ> [<ffffffff8000eba8>] ide_do_request+0x28f/0x78d [<ffffffff88206103>] :ide_cd:cdrom_decode_status+0x31c/0x347 [<ffffffff882066fe>] :ide_cd:cdrom_pc_intr+0x27/0x21c [<ffffffff8000d4f5>] ide_intr+0x1af/0x1df [<ffffffff80010a46>] handle_IRQ_event+0x51/0xa6 [<ffffffff800b7ade>] __do_IRQ+0xa4/0x103 [<ffffffff8006c95d>] do_IRQ+0xe7/0xf5 [<ffffffff8005d615>] ret_from_intr+0x0/0xa [<ffffffff80059cd6>] ide_outsw+0x0/0x9 [<ffffffff80011f84>] __do_softirq+0x51/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006cada>] do_softirq+0x2c/0x85 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff80059cd6>] ide_outsw+0x0/0x9 [<ffffffff80059cde>] ide_outsw+0x8/0x9 [<ffffffff801cd618>] atapi_output_bytes+0x23/0x5e [<ffffffff882066d7>] :ide_cd:cdrom_pc_intr+0x0/0x21c [<ffffffff88206d19>] :ide_cd:cdrom_transfer_packet_command+0xb0/0xdb [<ffffffff88206d91>] :ide_cd:cdrom_do_pc_continuation+0x0/0x2b [<ffffffff882055ad>] :ide_cd:cdrom_start_packet_command+0x14f/0x15b [<ffffffff8000eec4>] ide_do_request+0x5ab/0x78d [<ffffffff8013cb30>] elv_insert+0xd6/0x1f7 [<ffffffff800414bc>] ide_do_drive_cmd+0xc0/0x116 [<ffffffff882035f7>] :ide_cd:cdrom_queue_packet_command+0x46/0xe2 [<ffffffff80059cde>] ide_outsw+0x8/0x9 [<ffffffff801cc987>] ide_init_drive_cmd+0x10/0x24 [<ffffffff88203b0f>] :ide_cd:cdrom_check_status+0x62/0x71 [<ffffffff8013e034>] blk_end_sync_rq+0x0/0x2e [<ffffffff88203b3a>] :ide_cd:ide_cdrom_check_media_change_real+0x1c/0x37 [<ffffffff881d7076>] :cdrom:media_changed+0x44/0x74 [<ffffffff800df8d7>] check_disk_change+0x1f/0x50 [<ffffffff881db33b>] :cdrom:cdrom_open+0x8ef/0x93c [<ffffffff8000cbb6>] do_lookup+0x65/0x1e6 [<ffffffff8000d0d4>] dput+0x2c/0x114 [<ffffffff8000a3be>] __link_path_walk+0xdf8/0xf42 [<ffffffff8002c77e>] mntput_no_expire+0x19/0x89 [<ffffffff8000e881>] link_path_walk+0xd3/0xe5 [<ffffffff80063db6>] do_nanosleep+0x47/0x70 [<ffffffff8000d0d4>] dput+0x2c/0x114 [<ffffffff80057987>] kobject_get+0x12/0x17 [<ffffffff80140caf>] get_disk+0x3f/0x81 [<ffffffff8005a659>] exact_lock+0xc/0x14 [<ffffffff801b8f11>] kobj_lookup+0x132/0x19b [<ffffffff88203e8d>] :ide_cd:idecd_open+0x9f/0xd0 [<ffffffff800dff49>] do_open+0xa2/0x30f [<ffffffff800e040a>] blkdev_open+0x0/0x4f [<ffffffff800e042d>] blkdev_open+0x23/0x4f [<ffffffff8001e4f2>] __dentry_open+0xd9/0x1dc [<ffffffff80026f1f>] do_filp_open+0x2a/0x38 [<ffffffff80063db6>] do_nanosleep+0x47/0x70 [<ffffffff8000d0d4>] dput+0x2c/0x114 [<ffffffff800198ab>] do_sys_open+0x44/0xbe [<ffffffff8005d116>] system_call+0x7e/0x83 but CPU does not go up to 100% and is useable. Migrating back from B to A also works, the lockup is thrown one, the virtual machine is still useable. Maybe there is a problem with the threads and they should be locked to a definite cpu?
Okay, works at least for CentOS 5.2 with 2.6.18-92.el5 kernel in i686 for the virtual machine rr019v3# uname -a Linux rr019v3 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT 2008 i686 i686 i386 GNU/Linux Migrating to and fro works without problems. 2.6.18-128.1.1.el5 x86_64 from Red Hat does not work for the virtual machine. noapic, nmi-watchdog=1 or pci=noirqpoll does not help with -128 or -128.1.1
Kernel 2.6.18-128.el5.i686 for the virtual machine from RHEL also works without problems (new install) on 2.6.18-128.el5.x86_64 host.
I have tested following: above stated command for the virtual machine and additionally -no-kvm-irqchip => doesn't work either, same as without this parameter -no-kvm-pit => doesn't work either, same as without this parameter -no-kvm-pit-reinjection => doesn't work either, same as without this parameter -no-kvm => works like i686 without any errormessages in dmesg to and fro in between my hosts -tdf => doesn't work either, same as without this parameter So anyone any ideas?
The problem still exists - or persists - with the today-released kvm-85.
I am getting the same results. My system config is: # uname -a Linux h39 2.6.18-128.1.6.el5 #1 SMP Wed Apr 1 09:10:25 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux ]# modinfo kvm filename: /lib/modules/2.6.18-128.1.6.el5/extra/kvm/kvm.ko license: GPL author: Qumranet version: kvm-85 srcversion: F75E598A4C8C6749972C7BB depends: vermagic: 2.6.18-128.1.6.el5 SMP mod_unload gcc-4.1 parm: oos_shadow:bool # modinfo kvm_intel filename: /lib/modules/2.6.18-128.1.6.el5/extra/kvm/kvm-intel.ko license: GPL author: Qumranet version: kvm-85 srcversion: C039A0000B33711A2AA2703 depends: kvm vermagic: 2.6.18-128.1.6.el5 SMP mod_unload gcc-4.1 parm: bypass_guest_pf:bool parm: vpid:bool parm: flexpriority:bool parm: ept:bool parm: emulate_invalid_guest_state:bool I am using libvirtd to manage the system.
Seems like the problem is gone with rhel5.4 beta and kvm from the virtualization-channel. on both hosts: Installed Packages etherboot-zroms-kvm.x86_64 5.4.4-10.el5 installed kernel.x86_64 2.6.18-128.1.14.el5 installed kernel.x86_64 2.6.18-128.1.16.el5 installed kernel.x86_64 2.6.18-155.el5 installed kmod-kvm.x86_64 83-80.el5 installed kvm.x86_64 83-80.el5 installed libvirt.x86_64 0.6.3-11.el5 installed redhat-release.x86_64 5Server-5.4.0.2 installed on vm: Installed Packages kernel.x86_64 2.6.18-128.1.6.el5 installed disk-image is on iscsi as before. rr017# virsh migrate --live rr019v4 qemu+tcp://192.168.1.20/system tcp:192.168.1.20:4444 rr019v4# dmesg (empty) rr019v4# cat /proc/cpuinfo |grep processor processor : 0 processor : 1 no process is going up to 100% and back virsh migrate --live rr019v4 qemu+tcp://192.168.1.17/system tcp:192.168.1.17:4444 rr019v4# dmesg (empty) I would close this bug with reason "NEXTRELEASE" if i am abled to do so.