Hide Forgot
RHEL-6 x86_64 kvm guest mounting ext4 over iscsi. This server is a Linux mirror server with rsync, lighttpd and vsftpd. After several hours of uptime this happens, with the second BUG repeating several times per second. Please let me know if you need more information. Jul 24 21:05:40 mirror kernel: BUG: scheduling while atomic: swapper/0/0x00000100 Jul 24 21:05:40 mirror kernel: Modules linked in: autofs4 sunrpc sg sd_mod crc_t10dif nf_conntrack_ftp ipt_REJECT xt_helper nf_conntrack_ipv4 f_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio ata_generic pata_acpi ata_piix dm_mod [last unloaded: speedstep_lib] Jul 24 21:05:40 mirror kernel: Pid: 0, comm: swapper Not tainted 2.6.32-131.6.1.el6.x86_64 #1 Jul 24 21:05:40 mirror kernel: Call Trace: Jul 24 21:05:40 mirror kernel: [<ffffffff81055cb6>] ? __schedule_bug+0x66/0x70 Jul 24 21:05:40 mirror kernel: [<ffffffff814db1b2>] ? thread_return+0x5d9/0x777 Jul 24 21:05:40 mirror kernel: [<ffffffff81093284>] ? hrtimer_start_range_ns+0x14/0x20 Jul 24 21:05:40 mirror kernel: [<ffffffff81009ebe>] ? cpu_idle+0xee/0x110 Jul 24 21:05:40 mirror kernel: [<ffffffff814c305a>] ? rest_init+0x7a/0x80 Jul 24 21:05:40 mirror kernel: [<ffffffff81bbdf28>] ? start_kernel+0x41d/0x429 Jul 24 21:05:40 mirror kernel: [<ffffffff81bbd33a>] ? x86_64_start_reservations+0x125/0x129 Jul 24 21:05:40 mirror kernel: [<ffffffff81bbd438>] ? x86_64_start_kernel+0xfa/0x109 Jul 24 21:05:40 mirror kernel: NOHZ: local_softirq_pending 20a Jul 24 21:05:40 mirror kernel: BUG: scheduling while atomic: swapper/0/0x00000100 Jul 24 21:05:40 mirror kernel: Modules linked in: autofs4 sunrpc sg sd_mod crc_t10dif nf_conntrack_ftp ipt_REJECT xt_helper nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio ata_generic pata_acpi ata_piix dm_mod [last unloaded: speedstep_lib] Jul 24 21:05:40 mirror kernel: Pid: 0, comm: swapper Not tainted 2.6.32-131.6.1.el6.x86_64 #1 Jul 24 21:05:40 mirror kernel: Call Trace: Jul 24 21:05:40 mirror kernel: [<ffffffff81055cb6>] ? __schedule_bug+0x66/0x70 Jul 24 21:05:40 mirror kernel: [<ffffffff814db1b2>] ? thread_return+0x5d9/0x777 Jul 24 21:05:40 mirror kernel: [<ffffffff81095005>] ? sched_clock_local+0x25/0x90 Jul 24 21:05:40 mirror kernel: [<ffffffff8109e2ab>] ? tick_nohz_stop_idle+0x3b/0x50 Jul 24 21:05:40 mirror kernel: [<ffffffff81009ebe>] ? cpu_idle+0xee/0x110 Jul 24 21:05:40 mirror kernel: [<ffffffff814c305a>] ? rest_init+0x7a/0x80 Jul 24 21:05:40 mirror kernel: [<ffffffff81bbdf28>] ? start_kernel+0x41d/0x429 Jul 24 21:05:40 mirror kernel: [<ffffffff81bbd33a>] ? x86_64_start_reservations+0x125/0x129 Jul 24 21:05:40 mirror kernel: [<ffffffff81bbd438>] ? x86_64_start_kernel+0xfa/0x109
http://mirror.ancl.hawaii.edu/ This issue is crippling our new Fedora mirror server. Please help!
Workaround: Dropping the kvm guest to one VCPU seems to prevent this problem.
Warren, could you add some info on the configuration of the system that sees this issue? What kinds of storage are attached? What kind of network interface? How much traffic?
Host Hardware ============= HP DL360 G6 Intel Xeon E5540 with 6GB RAM (4 core with hyperthreading) Netgear ReadyNAS 4200 serving 12TB array over iSCSI /dev/sda: Hewlett-Packard Company Smart Array G6 controllers (rev 01) eth0: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) eth1: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Mirror VM ========= (virtio_net) eth0 http://mirror.ancl.hawaii.edu (virtio_net) eth1 (private interface to NAS) (virtio_blk) / filesystem on local SCSI disk (iscsi) /srv/mirror 12TB ext4 formatted iSCSI accessed via eth1 Workload ======== The mirror runs several rsync processes from cron to sync data from Fedora, EPEL, CentOS, Scientific Linux, The Document Foundation, Debian, Ubuntu and more. Meanwhile, lighttpd, vsftpd and rsync daemon serve mirror content to clients. All mirror traffic traverses both eth0 and eth1 as /srv/mirror content is stored on the ext4 formatted iSCSI NAS array.
(In reply to comment #5 > eth0: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) > eth1: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) What iscsi driver are you using? The software iscsi driver, iscsi_tcp, or do your broadcom cards have iscsi offload enabled and are you using bnx2i for offloaded iscsi? By default we will use iscsi_tcp. If when you are logged into the iscsi target you can run iscsiadm -m session -P 3 | grep transport to see if bnx2i is being used.
iscsi_tcp. As noted above, the iscsi initiator is running inside the VM, communicating over virtio_net to the iscsi target, so it wouldn't be able to use hardware offloaded iscsi.
Note: "BUG: scheduling while atomic: swapper/0/0x10000002" shows also in bug 726877 for 3.1.0-0.rc0... kernels with no virtio_net and no iscsi.
(In reply to comment #8) > Note: "BUG: scheduling while atomic: swapper/0/0x10000002" shows also in bug > 726877 for 3.1.0-0.rc0... kernels with no virtio_net and no iscsi. This is most likely a different bug entirely. I switched the RHEL-6.1 VM back to 3 VCPU's, but this time changed virtual eth0 and eth1 from virtio_net to e1000. Several simultaneously streams both to/from the server, traversing eth0 via rsync and httpd, reading and writing /srv/mirror over eth1 to the iSCSI array for the past hour. So far the kernel bug has not triggered. So it would seem this is a bug specific to virtio_net. Riel identified similar bugs fixed in other network drivers, but not libvirt_net?
Someone has plan to fix it? With no virtio,the IO performance is very poor
Finally seeing this here as well, but oddly on a machine that has no iscsi, only a lot of network traffic. kernel 2.6.32-131.12.1.el6.x86_64 here. kernel BUG: scheduling while atomic: swapper/0/0x00010000 Modules linked in: tun ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter iptable_raw ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6table_raw ip6_tables ipv6 ext3 jbd virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.32-131.12.1.el6.x86_64 #1 Call Trace: [<ffffffff81055cb6>] ? __schedule_bug+0x66/0x70 [<ffffffff814db2e2>] ? thread_return+0x5d9/0x777 [<ffffffff81095085>] ? sched_clock_local+0x25/0x90 [<ffffffff8109e32b>] ? tick_nohz_stop_idle+0x3b/0x50 [<ffffffff81009ebe>] ? cpu_idle+0xee/0x110 [<ffffffff814c318a>] ? rest_init+0x7a/0x80 [<ffffffff81c1df28>] ? start_kernel+0x41d/0x429 [<ffffffff81c1d33a>] ? x86_64_start_reservations+0x125/0x129 [<ffffffff81c1d438>] ? x86_64_start_kernel+0xfa/0x109
Switching the machine to e1000 didn't seem to help. The problem re-occurs every few days. :(
Kevin, with virtio_net but one VCPU does it still occur?
Not sure. I just set it to 1 vcpu (althought it has e1000 still in place). Will know in a day or two. ;)
I've got exactly the same issue on one vps (of the 4 on this host). Host server is Intel SR1690WBR with two Xeon E5606 and 24Gb RAM. VPS has 3 vcores, 8Gb ram and drbd disk. #uname -a Linux v1.xxxx.xx 2.6.32-131.6.1.el6.x86_64 #1 SMP Tue Jul 12 17:14:50 CDT 2011 x86_64 x86_64 x86_64 GNU/Linux Restart fixes this issue for a day or two, but it's not a good solution.
I'll try to run it on one vcpu, but not sure due to heavy load.
Hit it again with 1 vcpu and e1000. Moved it to 1 vcpu and virtio now. Seems to hit about once a day here.
Issue re-occurs with 1vcpu and virtio as well. We are going to try a 32bit guest. It seems all the other reports here are with 64 guests?
Yes, we use 64 bit guests. As for us, problem vps is working for about 3 days on one vcpu without problems. But: 1) there is other vps on this host with 3 vcpu and the same kernel. It is working just fine for about 4 months, 2) problem vps worked without problems on 2 vpcu for about 1 month before issue.
Yes, issue re-occurs with 1vcpu for me too. So, it doesn't depends on number of vcpus. Please notice that in all cases there are message in logs: [last unloaded: speedstep_lib].
We rebuilt our affected guest as 32bit and it's been up and working fine for almost 2 days now. (With 4 vcpus and virtio). So, this could well be a x86_64 specific bug.
This appears to be a duplicate of bug 683658, which was fixed in kernel-2.6.32-174.el6 Note that this is a HOST side bug. You need to upgrade your host kernel to avoid this bug showing up in guests.
(In reply to comment #22) > This appears to be a duplicate of bug 683658, which was fixed in > kernel-2.6.32-174.el6 > > Note that this is a HOST side bug. You need to upgrade your host kernel to > avoid this bug showing up in guests. Fully agree, I remember this one. Dear reporters, please try the 6.1.z host kernel - kernel-2.6.32-131.11.1.el6 For the time being I'll close it as a duplicate, please reopen if this kernel does not solve it. *** This bug has been marked as a duplicate of bug 683658 ***
Hi all. We updated the kernel to 2.6.32-131.12.1.el6.x86_64,but the issue was not solved.
(In reply to comment #25) > Hi all. > We updated the kernel to 2.6.32-131.12.1.el6.x86_64,but the issue was not > solved. Have you updated _host_ kernel, not a guest one?
(In reply to comment #25) > We updated the kernel to 2.6.32-131.12.1.el6.x86_64,but the issue was not > solved. From what I see we do not yet have an errata for this. So the testkernel kernel-2.6.32-131.11.1.el6 has to be installed to fix the problem with 6.1.z . Another option is to use the rhel6.2 beta kernel, the patch went in there bevore 2.6.32-173 was tagged.
As for me, upgrading host kernel to 2.6.32-131.12.1.el6.x86_64 solved this issue. At least two weeks without crashes. Thanks for help!
(In reply to comment #27) > (In reply to comment #25) > > We updated the kernel to 2.6.32-131.12.1.el6.x86_64,but the issue was not > > solved. > > From what I see we do not yet have an errata for this. > So the testkernel kernel-2.6.32-131.11.1.el6 has to be installed to fix the > problem with 6.1.z . > Another option is to use the rhel6.2 beta kernel, the patch went in there > bevore 2.6.32-173 was tagged. Thanks,after updating the host kernel to 2.6.32-131.12.1.el6.x86_64,the issue solved.