Bug 455097

Summary: 2.6.26-0.124.rc9.git5.fc10.x86_64 oops in new_slab in kvm guest
Product: [Fedora] Fedora Reporter: Roland Dreier <rolandd>
Component: kvmAssignee: Glauber Costa <gcosta>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: medium    
Version: 10CC: bashton, berrange, clalance, gcosta, kernel-maint, markmc, mtosatti, quintela, virt-maint, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-04 21:50:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roland Dreier 2008-07-11 22:17:14 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9) Gecko/2008061017 Firefox/3.0

Description of problem:
I have a Fedora 9 x86_64 image plus the rawhide kernel 2.6.26-0.115.rc9.git2.fc10 running as a guest on a kvm-70/upstream 2.6.26-rc9 host.  The guest just got the following oops on serial console while it was pretty idle, with just an ssh loging shell running, not doing anything:

BUG: unable to handle kernel paging request at ffff81003e066000
IP: [<ffffffff810b1630>] new_slab+0x279/0x2f1
PGD 8063 PUD 9063 PMD 3e62d163 PTE 800000003e066160
Oops: 0002 [1] SMP DEBUG_PAGEALLOC
CPU 0
Modules linked in: bridge bnep rfcomm l2cap bluetooth fuse sunrpc ipt_REJECT nf_conntrack_ipv4 iptab
le_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip
6_tables x_tables ipv6 loop dm_multipath ppdev sr_mod cdrom snd_seq_dummy snd_seq_oss snd_seq_midi_e
vent snd_seq snd_seq_device parport_pc snd_pcm_oss parport snd_mixer_oss virtio_net snd_pcm floppy s
nd_timer ata_generic snd soundcore snd_page_alloc pcspkr i2c_piix4 ata_piix i2c_core pata_acpi dm_sn
apshot dm_zero dm_mirror dm_log dm_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd mbcache uhc
i_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
Pid: 2617, comm: dmesg Not tainted 2.6.26-0.124.rc9.git5.fc10.x86_64 #1
RIP: 0010:[<ffffffff810b1630>]  [<ffffffff810b1630>] new_slab+0x279/0x2f1
RSP: 0018:ffffffff816b0990  EFLAGS: 00010016
RAX: 002000000000205a RBX: ffffe20001742640 RCX: 0000000000002000
RDX: 0000000000000001 RSI: 0000000000002000 RDI: ffff81003e066000
RBP: ffffffff816b09c0 R08: ffffffff816b0760 R09: 0000000000000086
R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff
R13: 0000000000004020 R14: ffff81003e066000 R15: ffff81003fb06000
FS:  00007f8de59836f0(0000) GS:ffffffff81492000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff81003e066000 CR3: 000000003180d000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dmesg (pid: 2617, threadinfo ffff810031912000, task ffff81003d93a5b0)
Stack:  ffffffff816b09a0 ffffe20000c78e40 0000000000000000 ffff81000107c420
 ffff81003fb06000 0000000000000020 ffffffff816b0a20 ffffffff810b1c55
 ffffffff81254ba0 00000020ffffffff ffff81003dc2dbe0 ffffffff81253823
Call Trace:
 <IRQ>  [<ffffffff810b1c55>] __slab_alloc+0x273/0x490
 [<ffffffff81254ba0>] ? __alloc_skb+0x42/0x135
 [<ffffffff81253823>] ? skb_queue_head+0x40/0x49
[<ffffffff810b1ee9>] kmem_cache_alloc_node+0x77/0xde
 [<ffffffff81254ba0>] ? __alloc_skb+0x42/0x135
 [<ffffffff81254ba0>] __alloc_skb+0x42/0x135
 [<ffffffff81255635>] __netdev_alloc_skb+0x31/0x50
 [<ffffffffa01131c4>] :virtio_net:try_fill_recv+0x53/0x10b
 [<ffffffffa0113d15>] :virtio_net:virtnet_poll+0x22a/0x2e0
 [<ffffffff81258366>] ? net_rx_action+0x73/0x22d
 [<ffffffff812583da>] net_rx_action+0xe7/0x22d
 [<ffffffff8103e7d6>] __do_softirq+0x77/0x101
 [<ffffffff8100d65c>] call_softirq+0x1c/0x28
 [<ffffffff8100e965>] do_softirq+0x4d/0xb0
 [<ffffffff8103e29b>] irq_exit+0x4e/0x8f
 [<ffffffff8100ec85>] do_IRQ+0x147/0x169
 [<ffffffff8100c732>] ret_from_intr+0x0/0x1e
 <EOI>  [<ffffffff81097294>] ? unmap_vmas+0x3e7/0x876
 [<ffffffff8109722d>] ? unmap_vmas+0x380/0x876
 [<ffffffff8109b872>] ? exit_mmap+0x7c/0xf3
 [<ffffffff81036bc6>] ? mmput+0x42/0x9e
 [<ffffffff8103aca1>] ? exit_mm+0xe6/0xef
 [<ffffffff8103c7d4>] ? do_exit+0x27b/0x8d4
 [<ffffffff8107c0c7>] ? audit_syscall_entry+0x126/0x15a
 [<ffffffff8107bd98>] ? audit_syscall_exit+0x331/0x353
 [<ffffffff8103cea6>] ? do_group_exit+0x79/0xa9
 [<ffffffff8103cee8>] ? sys_exit_group+0x12/0x14
 [<ffffffff8100c2c7>] ? tracesys+0xd5/0xda


Code: 10 49 8b 07 f6 c4 08 74 24 48 8b 03 31 d2 f6 c4 20 74 06 8b 93 b8 00 00 00 88 d1 be 00 10 00 00 b0 5a 48 d3 e6 4c 89 f7 48 89 f1 <f3> aa 4d 89 f5 4d 89 f4 eb 21 4c 89 ea 48 89 de 4c 89 ff e8 9a
RIP  [<ffffffff810b1630>] new_slab+0x279/0x2f1
 RSP <ffffffff816b0990>
CR2: ffff81003e066000
---[ end trace 6acba83846156d5a ]---


Version-Release number of selected component (if applicable):
2.6.26-0.115.rc9.git2.fc10

How reproducible:
Sometimes


Steps to Reproduce:
This is the second oops I've seen, but the first one I've caught after setting up a serial oconsole.

Actual Results:


Expected Results:


Additional info:

Comment 1 Roland Dreier 2008-07-15 18:36:49 UTC
Got a similar looking crash on 2.6.26-136.fc10.x86_64 (running in a VM on the
same host system):

BUG: unable to handle kernel paging request at ffff8100375c0000
IP: [<ffffffff810b17d4>] new_slab+0x279/0x2f1
PGD 8063 PUD 9063 PMD 38222163 PTE 80000000375c0160
Oops: 0002 [1] SMP DEBUG_PAGEALLOC
CPU 0
Modules linked in: bridge bnep rfcomm l2cap bluetooth fuse sunrpc ipt_REJECT
nf_conntrack_ipv4 iptab
le_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state
nf_conntrack ip6table_filter ip
6_tables x_tables ipv6 loop dm_multipath ppdev sr_mod cdrom snd_seq_dummy
snd_seq_oss snd_seq_midi_e
vent snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm virtio_net
parport_pc parport floppy s
nd_timer ata_generic snd soundcore snd_page_alloc pcspkr ata_piix pata_acpi
i2c_piix4 i2c_core dm_sn
apshot dm_zero dm_mirror dm_log dm_mod virtio_blk virtio_pci virtio_ring virtio
ext3 jbd mbcache uhc
i_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
Pid: 2916, comm: sendmail Not tainted 2.6.26-136.fc10.x86_64 #1
RIP: 0010:[<ffffffff810b17d4>]  [<ffffffff810b17d4>] new_slab+0x279/0x2f1
RSP: 0018:ffffffff816b0990  EFLAGS: 00010016
RAX: 002000000000205a RBX: ffffe200014c2800 RCX: 0000000000008000
RDX: 0000000000000003 RSI: 0000000000008000 RDI: ffff8100375c0000
RBP: ffffffff816b09c0 R08: ffffffff816b0760 R09: 0000000000000086
R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff
R13: 0000000000004020 R14: ffff8100375c0000 R15: ffffffff81531860
FS:  00007f70e567a7a0(0000) GS:ffffffff81492000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8100375c0000 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sendmail (pid: 2916, threadinfo ffff8100341fe000, task ffff8100318025b0)
Stack:  ffffffff816b09a0 ffffe20001553400 0000000000000000 ffff81000107bf80
 ffffffff81531860 0000000000000020 ffffffff816b0a20 ffffffff810b1df9
 ffffffff81255863 00000020ffffffff ffffffff816b0a10 ffffffff810595dd
Call Trace:
 <IRQ>  [<ffffffff810b1df9>] __slab_alloc+0x273/0x490
 [<ffffffff81255863>] ? __netdev_alloc_skb+0x31/0x4e
 [<ffffffff810595dd>] ? mark_held_locks+0x5c/0x77
 [<ffffffff810b2e33>] __kmalloc_node_track_caller+0x9f/0x103
 [<ffffffff81255863>] ? __netdev_alloc_skb+0x31/0x4e
 [<ffffffff81254dec>] __alloc_skb+0x6f/0x135
[<ffffffff81255863>] __netdev_alloc_skb+0x31/0x4e
 [<ffffffffa01111c4>] :virtio_net:try_fill_recv+0x53/0x10b
 [<ffffffffa0111d15>] :virtio_net:virtnet_poll+0x22a/0x2e0
 [<ffffffff81258592>] ? net_rx_action+0x73/0x22d
 [<ffffffff81258606>] net_rx_action+0xe7/0x22d
 [<ffffffff8103e7ea>] __do_softirq+0x77/0x101
 [<ffffffff8100d64c>] call_softirq+0x1c/0x28
 [<ffffffff8100e955>] do_softirq+0x4d/0xb0
 [<ffffffff8103e2af>] irq_exit+0x4e/0x8f
 [<ffffffff8100ec75>] do_IRQ+0x147/0x169
 [<ffffffff8100c732>] ret_from_intr+0x0/0x1e
 <EOI>  [<ffffffff81097438>] ? unmap_vmas+0x3e7/0x876
 [<ffffffff810973d1>] ? unmap_vmas+0x380/0x876
 [<ffffffff8109ba16>] ? exit_mmap+0x7c/0xf3
 [<ffffffff81036bda>] ? mmput+0x42/0x9e
 [<ffffffff8103acb5>] ? exit_mm+0xe6/0xef
 [<ffffffff8103c7e8>] ? do_exit+0x27b/0x8d4
 [<ffffffff8107c26b>] ? audit_syscall_entry+0x126/0x15a
 [<ffffffff8107bf3c>] ? audit_syscall_exit+0x331/0x353
 [<ffffffff8103ceba>] ? do_group_exit+0x79/0xa9
 [<ffffffff8103cefc>] ? sys_exit_group+0x12/0x14
 [<ffffffff8100c2c2>] ? tracesys+0xd0/0xd5


Code: 10 49 8b 07 f6 c4 08 74 24 48 8b 03 31 d2 f6 c4 20 74 06 8b 93 b8 00 00 00
88 d1 be 00 10 00 00 b0 5a 48 d3 e6 4c 89 f7 48 89 f1 <f3> aa 4d 89 f5 4d 89 f4
eb 21 4c 89 ea 48 89 de 4c 89 ff e8 9a
RIP  [<ffffffff810b17d4>] new_slab+0x279/0x2f1
 RSP <ffffffff816b0990>
CR2: ffff8100375c0000
---[ end trace 42efba9b37ce41f3 ]---


Comment 2 Mark McLoughlin 2008-08-06 18:19:57 UTC
Roland: seen this lately with more recent rawhide kernels? Can you confirm you weren't seeing it with the stock F9 kernel?

Haven't seen any reports of this upstream ...

Both oops seem to show slab corruption when virtio_net tries to allocate more skbs from an interrupt that preempted an exiting process while it was freeing its vmas

It'd be interesting to see if it triggers with e.g. DEBUG_PAGEALLOC or SLUB_DEBUG_ON disabled

Comment 3 Roland Dreier 2008-08-09 23:20:28 UTC
Sorry for the slow response.

Anyway, I tried updating to 2.6.27-0.226.rc1.git5.fc10.x86_64 and I've found that it's difficult to even get the VM to boot reliably (same host -- kvm 72 with kernel post-2.6.27-rc2 latest git, including upstream kvm module).  For example I just got this early in boot:

 BUG: unable to handle kernel paging request at ffff8800335d1f58
IP: [<ffffffff8100ed1f>] copy_thread+0x47/0x1ae
PGD 202063 PUD 206063 PMD 33aca163 PTE 335d1160
Oops: 0002 [1] SMP DEBUG_PAGEALLOC
CPU 0
Modules linked in: dm_snapshot dm_zero dm_mirror dm_log dm_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
Pid: 801, comm: udevd Tainted: G S        2.6.27-0.226.rc1.git5.fc10.x86_64 #1
RIP: 0010:[<ffffffff8100ed1f>]  [<ffffffff8100ed1f>] copy_thread+0x47/0x1ae
RSP: 0018:ffff88002f9f5da8  EFLAGS: 00010286
RAX: ffff8800335d2000 RBX: ffff88002f9fcb20 RCX: 000000000000002a
RDX: 00007fff16698fe0 RSI: ffff88002f9f5f58 RDI: ffff8800335d1f58
RBP: ffff88002f9f5dd8 R08: ffff88002f9fcb20 R09: ffff88002f9f5f58
R10: 0000000000000046 R11: ffff88002f9f5c58 R12: ffff88002f9f8000
R13: ffff8800335d1f58 R14: 0000000001200011 R15: 0000000001200011
FS:  00007fe70e671780(0000) GS:ffffffff814f7380(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8800335d1f58 CR3: 000000002f8ee000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process udevd (pid: 801, threadinfo ffff88002f9f4000, task ffff88002f9f8000)
Stack:  ffff88002f988568 0000000000000000 0000000000000000 ffff88002f9fcb20
 0000000000000000 ffff88002f9fcdb8 ffff88002f9f5e78 ffffffff8104289e
 ffff88002f9f5f58 0000000000000000 ffffffffff5fc0b0 ffffffff81010bbe
Call Trace:
 [<ffffffff8104289e>] copy_process+0xc6d/0x13d7
 [<ffffffff81010bbe>] ? restore_args+0x0/0x30
 [<ffffffff81043111>] do_fork+0x109/0x250
 [<ffffffff810cd120>] ? fd_install+0x5b/0x64
 [<ffffffff810d57c4>] ? do_pipe_flags+0xb5/0x110
 [<ffffffff8101034a>] ? system_call_fastpath+0x16/0x1b
 [<ffffffff8100e622>] sys_clone+0x28/0x2a
 [<ffffffff81010867>] ptregscall_common+0x67/0xb0

Comment 4 Roland Dreier 2008-08-11 21:54:36 UTC
I booted 2.6.27-0.226.rc1.git5.fc10.x86_64 with "slub_debug=FPZ" and got the following oops after leaving the VM idle for a while:

BUG: unable to handle kernel paging request at ffff880012c08000
IP: [<ffffffff810c86c9>] new_slab+0x158/0x1cb
PGD 202063 PUD 206063 PMD 19d06163 PTE 12c08160
Oops: 0002 [1] SMP DEBUG_PAGEALLOC
CPU 0 
Modules linked in: bridge stp rfcomm bnep l2cap bluetooth fuse sunrpc ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 loop dm_multipath sr_mod cdrom ppdev snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device virtio_net floppy snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr ata_generic parport_pc parport i2c_piix4 ata_piix i2c_core pata_acpi dm_snapshot dm_zero dm_mirror dm_log dm_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
Pid: 2689, comm: 0logwatch Tainted: G S        2.6.27-0.226.rc1.git5.fc10.x86_64 #1
RIP: 0010:[<ffffffff810c86c9>]  [<ffffffff810c86c9>] new_slab+0x158/0x1cb
RSP: 0018:ffffffff816e3980  EFLAGS: 00010016
RAX: 000000000000005a RBX: ffffe20000708300 RCX: 0000000000008000
RDX: 000000000000005a RSI: 0000000000008000 RDI: ffff880012c08000
RBP: ffffffff816e39b0 R08: ffffffff816e3760 R09: ffffffff816e37b0
R10: 0000000000000046 R11: 0000000000000001 R12: 0000000000004020
R13: 000000000003000f R14: ffffffff814f4708 R15: ffff880012c08000
FS:  00007f94bb3956f0(0000) GS:ffffffff814f7380(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff880012c08000 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process 0logwatch (pid: 2689, threadinfo ffff88002d984000, task ffff88002ec1cac0)
Stack:  ffffffff816e39b0 ffffe200011ab100 0000000000000000 ffff88000107dd50
 ffffffff814f4708 0000000000000020 ffffffff816e3a10 ffffffff810c8cd1
 ffffffff812814a4 00000020ffffffff ffff88002ec1cac0 ffffffff814f4708
Call Trace:
 <IRQ>  [<ffffffff810c8cd1>] __slab_alloc+0x267/0x3eb
 [<ffffffff812814a4>] ? __netdev_alloc_skb+0x36/0x52
 [<ffffffff810c9c9e>] __kmalloc_node_track_caller+0xa4/0x108
 [<ffffffff812814a4>] ? __netdev_alloc_skb+0x36/0x52
 [<ffffffff812809f6>] __alloc_skb+0x74/0x13a
 [<ffffffff812814a4>] __netdev_alloc_skb+0x36/0x52
 [<ffffffffa014328a>] try_fill_recv+0x5f/0x1c5 [virtio_net]
 [<ffffffffa014408f>] virtnet_poll+0x2c4/0x386 [virtio_net]
 [<ffffffff812845c4>] net_rx_action+0xff/0x246
 [<ffffffff81048ff1>] __do_softirq+0x83/0x10e
 [<ffffffff81011d8c>] call_softirq+0x1c/0x28
 [<ffffffff81013051>] do_softirq+0x52/0xb5
 [<ffffffff81048b99>] irq_exit+0x53/0xa2
 [<ffffffff81013380>] do_IRQ+0x14c/0x16e
 [<ffffffff81010a93>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff81026d5a>] ? native_flush_tlb_global+0x47/0x56
 [<ffffffff8102d4c1>] ? kernel_map_pages+0x11a/0x12d
 [<ffffffff810a16f9>] ? free_hot_cold_page+0xb6/0x193
 [<ffffffff810a1804>] ? __pagevec_free+0x2e/0x42
 [<ffffffff810a4de6>] ? release_pages+0x183/0x1ef
 [<ffffffff810b85a8>] ? free_pages_and_swap_cache+0x5c/0x77
 [<ffffffff810acba4>] ? unmap_vmas+0x5fb/0x87c
 [<ffffffff810b0f4e>] ? exit_mmap+0x91/0x10a
 [<ffffffff8104153f>] ? mmput+0x47/0xa3
 [<ffffffff810453d4>] ? exit_mm+0x10d/0x118
 [<ffffffff8104704b>] ? do_exit+0x2a0/0x904
 [<ffffffff8104772d>] ? do_group_exit+0x7e/0xae
 [<ffffffff81047774>] ? sys_exit_group+0x17/0x19
 [<ffffffff8101034a>] ? system_call_fastpath+0x16/0x1b


Code: c1 e0 0c 4c 8d 3c 10 49 8b 06 f6 c4 08 74 1e 48 89 df e8 6d d3 ff ff be 00 10 00 00 89 c1 b2 5a 48 d3 e6 4c 89 ff 88 d0 48 89 f1 <f3> aa 4d 89 fd 4d 89 fc eb 21 4c 89 ea 48 89 de 4c 89 f7 e8 30 
RIP  [<ffffffff810c86c9>] new_slab+0x158/0x1cb
 RSP <ffffffff816e3980>
CR2: ffff880012c08000
---[ end trace 39747af2df17e80b ]---

Comment 5 Mark McLoughlin 2008-11-11 17:25:41 UTC
Re-assigning kvm.ko bugs to the kvm package for easier tracking

Comment 6 Bug Zapper 2008-11-26 02:33:05 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Marcelo Tosatti 2009-03-04 21:50:37 UTC

*** This bug has been marked as a duplicate of bug 480822 ***