Bug 507545

Summary: xen kernel -- soft lockup
Product: Red Hat Enterprise Linux 5 Reporter: Ioannis Aslanidis <iaslanidis>
Component: kernel-xenAssignee: Rik van Riel <riel>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.2CC: clalance, pbonzini, shane, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-23 17:17:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514491    

Description Ioannis Aslanidis 2009-06-23 09:31:42 UTC
Description of problem:
Not often (but still happens every now and then), xen guest kernels report a "kernel bug: soft lockup". When this happens, the xen guest stops responding completely. The only way to fix this is to stop xend in the xen host and start it over; doing this 'fixes' the issue until the next lockup.


Version-Release number of selected component (if applicable):
release                : 2.6.18-92.el5xen
version                : #1 SMP Tue Apr 29 13:45:57 EDT 2008


How reproducible:
Run a xen guest in a xen host for months and it will eventually happen. The tendency I see is every three to six months.


Steps to Reproduce:
1. Set up a xen host and a xen guest as usual and have it running with no interruption.
  
Actual results:
The xen guest stops responding completely and requires to stop and start over xend in order to go back to normalcy. Once normalcy is reestablished, a "kernel bug: soft lockup" can be found in the message log.

Expected results:
The xen guest should never stop responding.

Additional info:

The funny thing about this is that I have seen three xen guests go down like this (information for each one below) in the same day (21 Jun 2009). Additionally, I have been discussing this with other colleagues and acquaintances and they told me that they experienced the exact same issue the same precise day (coincidence?).

Machine #1:

BUG: soft lockup detected on CPU#0!
 [<c1044e2f>] softlockup_tick+0xaa/0xc1
 [<c10087b3>] timer_interrupt+0x552/0x59f
 [<c121589a>] _spin_lock_irqsave+0x12/0x17
 [<c1126fe6>] __add_entropy_words+0x56/0x18b
 [<c10450a5>] handle_IRQ_event+0x1e/0x47
 [<c1046408>] handle_level_irq+0x93/0xdf
 [<c1046375>] handle_level_irq+0x0/0xdf
 [<c10068d8>] do_IRQ+0xb5/0xdd
 [<c103227e>] autoremove_wake_function+0x15/0x35
 [<c114a9b1>] evtchn_do_upcall+0x5f/0x97
 [<c1005006>] hypervisor_callback+0x46/0x50
 [<c102007b>] copy_process+0x1063/0x10df
 [<c114a2ec>] force_evtchn_callback+0xa/0xc
 [<c10219d2>] vprintk+0x2de/0x2e8
 [<c10c7297>] mls_sid_to_context+0x1f8/0x242
 [<c10c44a6>] context_struct_to_string+0x172/0x183
 [<c10219fb>] printk+0x1f/0xa1
 [<c103f961>] audit_log_vformat+0x90/0x140
 [<c103f653>] audit_log_end+0xcd/0xf1
 [<c10b8cfc>] avc_audit+0xcc3/0xcce
 [<c10b93cf>] inode_has_perm+0x5b/0x63
 [<c10761c4>] __d_lookup+0xbc/0xee
 [<c10b8d55>] avc_has_perm+0x4e/0x58
 [<c10b93cf>] inode_has_perm+0x5b/0x63
 [<c1079f5b>] mntput_no_expire+0x11/0x6e
 [<c106fbbc>] link_path_walk+0xa9/0xb3
 [<c10bdac1>] selinux_inode_getattr+0x50/0x58
 [<c106a438>] vfs_getattr+0x26/0x53
 [<c106a529>] vfs_stat_fd+0x2e/0x40
 [<c107668f>] dput+0x31/0xfd
 [<c1079f5b>] mntput_no_expire+0x11/0x6e
 [<c106a5c8>] sys_stat64+0xf/0x23
 [<c10686e2>] __fput+0x146/0x170
 [<c1216bbe>] do_page_fault+0x0/0xc2e
 [<c1004e48>] syscall_call+0x7/0xb
 =======================

Machine #2:

DMA per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  24   Cold: hi:   62, btch:  15 usd:  55
CPU    1: Hot: hi:  186, btch:  31 usd:  65   Cold: hi:   62, btch:  15 usd:   1
CPU    2: Hot: hi:  186, btch:  31 usd:  27   Cold: hi:   62, btch:  15 usd:  30
CPU    3: Hot: hi:  186, btch:  31 usd:  90   Cold: hi:   62, btch:  15 usd:  46
Active:32 inactive:2 dirty:0 writeback:0 unstable:0 free:686 slab:124322 mapped:2 pagetables:751
DMA free:2744kB min:2904kB low:3628kB high:4356kB active:128kB inactive:8kB present:528320kB pages_scanned:5408735 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 5*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2744kB
Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019
Free swap  = 374856kB
Total swap = 522104kB
printk: 43749 messages suppressed.
nrpe invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
 [<c044af9e>] out_of_memory+0x69/0x18b
 [<c044c418>] __alloc_pages+0x220/0x2aa
 [<c044d7e0>] __do_page_cache_readahead+0xc5/0x1cc
 [<c0421123>] getnstimeofday+0xd/0x21
 [<c06120ef>] _spin_unlock_irqrestore+0x8/0x16
 [<c0442e8d>] delayacct_end+0x70/0x77
 [<c044a3ea>] filemap_nopage+0x14d/0x319
 [<c0447e19>] __lock_page+0x58/0x5e
 [<c04551cd>] __handle_mm_fault+0x313/0x10b5
 [<c0613a92>] do_page_fault+0x6e0/0xc1b
 [<c05aa23b>] sys_send+0x37/0x3b
 [<c06120ef>] _spin_unlock_irqrestore+0x8/0x16
 [<c0408189>] do_gettimeofday+0x1d7/0x1ec
 [<c040687f>] do_IRQ+0xc6/0xdb
 [<c06133b2>] do_page_fault+0x0/0xc1b
 [<c061221d>] error_code+0x35/0x3c
 [<c0610033>] netlbl_cipsov4_listall_cb+0x1d2/0x1f7
 =======================
DMA per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  24   Cold: hi:   62, btch:  15 usd:  55
CPU    1: Hot: hi:  186, btch:  31 usd:  65   Cold: hi:   62, btch:  15 usd:   1
CPU    2: Hot: hi:  186, btch:  31 usd:  27   Cold: hi:   62, btch:  15 usd:  30
CPU    3: Hot: hi:  186, btch:  31 usd:  90   Cold: hi:   62, btch:  15 usd:  46
Active:34 inactive:0 dirty:0 writeback:0 unstable:0 free:686 slab:124322 mapped:2 pagetables:751
DMA free:2744kB min:2904kB low:3628kB high:4356kB active:136kB inactive:0kB present:528320kB pages_scanned:5527869 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 5*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2744kB
Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019
Free swap  = 374856kB
Total swap = 522104kB
printk: 43725 messages suppressed.
nrpe invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
 [<c044af9e>] out_of_memory+0x69/0x18b
 [<c044c418>] __alloc_pages+0x220/0x2aa
 [<c044d7e0>] __do_page_cache_readahead+0xc5/0x1cc
 [<c0442e8d>] delayacct_end+0x70/0x77
 [<c0447e27>] sync_page+0x0/0x3b
 [<c0442f85>] __delayacct_blkio_end+0x5b/0x5e
 [<c0611213>] __wait_on_bit_lock+0x4b/0x52
 [<c0447e19>] __lock_page+0x58/0x5e
 [<c044a3ea>] filemap_nopage+0x14d/0x319
 [<c04551cd>] __handle_mm_fault+0x313/0x10b5
 [<c040687f>] do_IRQ+0xc6/0xdb
 [<c06133b2>] do_page_fault+0x0/0xc1b
 [<c0404ff2>] hypervisor_callback+0x46/0x50
 [<c06133b2>] do_page_fault+0x0/0xc1b
 [<c0613a92>] do_page_fault+0x6e0/0xc1b
 [<c06120ef>] _spin_unlock_irqrestore+0x8/0x16
 [<c0408189>] do_gettimeofday+0x1d7/0x1ec
 [<c06133b2>] do_page_fault+0x0/0xc1b
 [<c061221d>] error_code+0x35/0x3c
 [<c0610033>] netlbl_cipsov4_listall_cb+0x1d2/0x1f7
 =======================
DMA per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  23   Cold: hi:   62, btch:  15 usd:  55
CPU    1: Hot: hi:  186, btch:  31 usd:  65   Cold: hi:   62, btch:  15 usd:   1
CPU    2: Hot: hi:  186, btch:  31 usd:  27   Cold: hi:   62, btch:  15 usd:  30
CPU    3: Hot: hi:  186, btch:  31 usd:  90   Cold: hi:   62, btch:  15 usd:  46
Active:32 inactive:2 dirty:0 writeback:0 unstable:0 free:686 slab:124322 mapped:2 pagetables:751
DMA free:2744kB min:2904kB low:3628kB high:4356kB active:128kB inactive:8kB present:528320kB pages_scanned:5651095 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 5*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2744kB
Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019
Free swap  = 374856kB
Total swap = 522104kB
printk: 43722 messages suppressed.
nrpe invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
 [<c044af9e>] out_of_memory+0x69/0x18b
 [<c044c418>] __alloc_pages+0x220/0x2aa
 [<c044d7e0>] __do_page_cache_readahead+0xc5/0x1cc
 [<c0421123>] getnstimeofday+0xd/0x21
 [<c06120ef>] _spin_unlock_irqrestore+0x8/0x16
 [<c0442e8d>] delayacct_end+0x70/0x77
 [<c044a3ea>] filemap_nopage+0x14d/0x319
 [<c0447e19>] __lock_page+0x58/0x5e
 [<c04551cd>] __handle_mm_fault+0x313/0x10b5
 [<c0613a92>] do_page_fault+0x6e0/0xc1b
 [<c05aa23b>] sys_send+0x37/0x3b
 [<c06120ef>] _spin_unlock_irqrestore+0x8/0x16
 [<c0408189>] do_gettimeofday+0x1d7/0x1ec
 [<c040687f>] do_IRQ+0xc6/0xdb
 [<c06133b2>] do_page_fault+0x0/0xc1b
 [<c061221d>] error_code+0x35/0x3c
 [<c0610033>] netlbl_cipsov4_listall_cb+0x1d2/0x1f7
 =======================
DMA per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  22   Cold: hi:   62, btch:  15 usd:  55
CPU    1: Hot: hi:  186, btch:  31 usd:  65   Cold: hi:   62, btch:  15 usd:   1
CPU    2: Hot: hi:  186, btch:  31 usd:  27   Cold: hi:   62, btch:  15 usd:  30
CPU    3: Hot: hi:  186, btch:  31 usd:  90   Cold: hi:   62, btch:  15 usd:  46
Active:32 inactive:2 dirty:0 writeback:0 unstable:0 free:686 slab:124324 mapped:2 pagetables:751
DMA free:2744kB min:2904kB low:3628kB high:4356kB active:128kB inactive:8kB present:528320kB pages_scanned:5765043 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 5*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2744kB
Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019
Free swap  = 374856kB
Total swap = 522104kB
printk: 43707 messages suppressed.
httpd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
 [<c044af9e>] out_of_memory+0x69/0x18b
 [<c044c418>] __alloc_pages+0x220/0x2aa
 [<c044d7e0>] __do_page_cache_readahead+0xc5/0x1cc
 [<c0421123>] getnstimeofday+0xd/0x21
 [<c06120ef>] _spin_unlock_irqrestore+0x8/0x16
 [<c0442e8d>] delayacct_end+0x70/0x77
 [<c044a3ea>] filemap_nopage+0x14d/0x319
 [<c0447e19>] __lock_page+0x58/0x5e
 [<c04551cd>] __handle_mm_fault+0x313/0x10b5
 [<c06133b2>] do_page_fault+0x0/0xc1b
 [<c0404ff2>] hypervisor_callback+0x46/0x50
 [<c0613a92>] do_page_fault+0x6e0/0xc1b
 [<c06133b2>] do_page_fault+0x0/0xc1b
 [<c061221d>] error_code+0x35/0x3c
 [<c0610033>] netlbl_cipsov4_listall_cb+0x1d2/0x1f7
 =======================
DMA per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  21   Cold: hi:   62, btch:  15 usd:  55
CPU    1: Hot: hi:  186, btch:  31 usd:  65   Cold: hi:   62, btch:  15 usd:   1
CPU    2: Hot: hi:  186, btch:  31 usd:  27   Cold: hi:   62, btch:  15 usd:  30
CPU    3: Hot: hi:  186, btch:  31 usd:  90   Cold: hi:   62, btch:  15 usd:  46
Active:2 inactive:0 dirty:0 writeback:0 unstable:0 free:684 slab:124327 mapped:2 pagetables:751
DMA free:2736kB min:2904kB low:3628kB high:4356kB active:120kB inactive:0kB present:528320kB pages_scanned:5885989 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 4*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2736kB
Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019
Free swap  = 374856kB
Total swap = 522104kB
printk: 43722 messages suppressed.
nrpe invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
 [<c044af9e>] out_of_memory+0x69/0x18b
 [<c044c418>] __alloc_pages+0x220/0x2aa
 [<c044d7e0>] __do_page_cache_readahead+0xc5/0x1cc
 [<c0421123>] getnstimeofday+0xd/0x21
 [<c06120ef>] _spin_unlock_irqrestore+0x8/0x16
 [<c0442e8d>] delayacct_end+0x70/0x77
 [<c044a3ea>] filemap_nopage+0x14d/0x319
 [<c0447e19>] __lock_page+0x58/0x5e
 [<c04551cd>] __handle_mm_fault+0x313/0x10b5
 [<c0613a92>] do_page_fault+0x6e0/0xc1b
 [<c05aa23b>] sys_send+0x37/0x3b
 [<c06120ef>] _spin_unlock_irqrestore+0x8/0x16
 [<c0408189>] do_gettimeofday+0x1d7/0x1ec
 [<c040687f>] do_IRQ+0xc6/0xdb
 [<c06133b2>] do_page_fault+0x0/0xc1b
 [<c061221d>] error_code+0x35/0x3c
 [<c0610033>] netlbl_cipsov4_listall_cb+0x1d2/0x1f7
 =======================
DMA per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  19   Cold: hi:   62, btch:  15 usd:  55
CPU    1: Hot: hi:  186, btch:  31 usd:  65   Cold: hi:   62, btch:  15 usd:   1
CPU    2: Hot: hi:  186, btch:  31 usd:  27   Cold: hi:   62, btch:  15 usd:  30
CPU    3: Hot: hi:  186, btch:  31 usd:  90   Cold: hi:   62, btch:  15 usd:  46
Active:34 inactive:0 dirty:0 writeback:0 unstable:0 free:684 slab:124329 mapped:2 pagetables:751
DMA free:2736kB min:2904kB low:3628kB high:4356kB active:136kB inactive:0kB present:528320kB pages_scanned:6001841 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 4*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2736kB
Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019
Free swap  = 374856kB
Total swap = 522104kB
Out of memory: kill process 26145 (httpd) score 2129 or a child
Killed process 26145 (httpd)
Out of memory: kill process 1399 (flush) score 2038 or a child
Killed process 1399 (flush)

Machine #3:

BUG: soft lockup - CPU#3 stuck for 597s! [rndc:27063]

Pid: 27063, comm:                 rndc
EIP: 0061:[<c0401227>] CPU: 3
EIP is at 0xc0401227
 EFLAGS: 00200246    Not tainted  (2.6.18-92.1.10.el5xen #1)
EAX: 00030001 EBX: 00000000 ECX: 00000000 EDX: f54160c0
ESI: c90594ac EDI: c0772fb9 EBP: 000000d9 DS: 007b ES: 007b
CR0: 8005003b CR2: 00d57598 CR3: 04d68000 CR4: 00000660
 [<c054940c>] force_evtchn_callback+0xa/0xc
 [<c0421f70>] vprintk+0x2b1/0x2bb
 [<c04ca3b6>] mls_sid_to_context+0x1f8/0x242
 [<c0421f93>] printk+0x19/0x9c
 [<c043f9cf>] audit_log_vformat+0x8d/0x166
 [<c043f67f>] audit_log_end+0xcd/0xf6
 [<c04bbf6e>] avc_audit+0xacd/0xad8
 [<c04bbfb3>] avc_has_perm+0x3a/0x44
 [<c04bc5ad>] inode_has_perm+0x54/0x5c
 [<c04bbfb3>] avc_has_perm+0x3a/0x44
 [<c04bc634>] file_has_perm+0x7f/0x88
 [<c04bed62>] selinux_bprm_post_apply_creds+0x195/0x3b9
 [<c04c100d>] selinux_bprm_apply_creds+0x11/0xab
 [<c0477211>] compute_creds+0xbe/0xc1
 [<c049426f>] load_elf_binary+0xea3/0x15e7
 [<c046d5df>] do_sync_read+0xb6/0xf1
 [<c0476072>] search_binary_handler+0x99/0x219
 [<c0477a4f>] do_execve+0x158/0x1f5
 [<c040337d>] sys_execve+0x2a/0x4a
 [<c0405413>] syscall_call+0x7/0xb
 =======================

Comment 1 Rik van Riel 2009-06-25 14:51:33 UTC
The soft lockup looks like it could be just a side effect of running out of memory (and the kernel desperately looking for something to free):

Active:32 inactive:2 dirty:0 writeback:0 unstable:0 free:686 slab:124322
mapped:2 pagetables:751
DMA free:2744kB min:2904kB low:3628kB high:4356kB active:128kB inactive:8kB
present:528320kB pages_scanned:5408735 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 5*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB
0*4096kB = 2744kB
Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019
Free swap  = 374856kB
Total swap = 522104kB
printk: 43749 messages suppressed.
nrpe invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

As you can see, 497288 out of 528320 kB is taken up by the slab!

It looks like something is leaking data structures into the slab.

Could you give us a snapshot of /proc/slabinfo, or slabtop output, before a system gets into trouble?

Comment 2 Ioannis Aslanidis 2009-06-25 15:16:46 UTC
The out of memory part seems to have happened only to 1 of the 3 machines. In any case, I can give you a slab snapshot of what it looks like right now. I do not know how I could get a snapshot just before the problem.

# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
ip_vs_conn             0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
fib6_nodes             7    113     32  113    1 : tunables  120   60    8 : slabdata      1      1      0                                                                         
ip6_dst_cache          7     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0                                                                         
ndisc_cache            1     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0                                                                         
RAWv6                  7     10    768    5    1 : tunables   54   27    8 : slabdata      2      2      0                                                                         
UDPLITEv6              0      0    768    5    1 : tunables   54   27    8 : slabdata      0      0      0                                                                         
UDPv6                  1      5    768    5    1 : tunables   54   27    8 : slabdata      1      1      0                                                                         
tw_sock_TCPv6          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
request_sock_TCPv6      0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0                                                                        
TCPv6                  0      0   1408    5    2 : tunables   24   12    8 : slabdata      0      0      0                                                                         
ip_fib_alias          14    113     32  113    1 : tunables  120   60    8 : slabdata      1      1      0                                                                         
ip_fib_hash           14    113     32  113    1 : tunables  120   60    8 : slabdata      1      1      0                                                                         
nf_nat:help          622    686    284   14    1 : tunables   54   27    8 : slabdata     49     49      0                                                                         
nf_nat:base            0      0    252   15    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
nf_conntrack_expect      0      0    148   26    1 : tunables  120   60    8 : slabdata      0      0      0                                                                       
nf_conntrack:basic      0      0    220   18    1 : tunables  120   60    8 : slabdata      0      0      0                                                                        
dm_tio                 0      0     16  203    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
dm_io                  0      0     20  169    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
avtab_node         61745  61915     16  203    1 : tunables  120   60    8 : slabdata    305    305      0                                                                         
jbd_4k                11     11   4096    1    1 : tunables   24   12    8 : slabdata     11     11      0                                                                         
ext3_inode_cache   20652  20652    596    6    1 : tunables   54   27    8 : slabdata   3442   3442      0                                                                         
ext3_xattr          1787   2418     48   78    1 : tunables  120   60    8 : slabdata     31     31      0                                                                         
journal_handle        84    169     20  169    1 : tunables  120   60    8 : slabdata      1      1      0                                                                         
journal_head         324    432     52   72    1 : tunables  120   60    8 : slabdata      6      6     24                                                                         
revoke_table           2    254     12  254    1 : tunables  120   60    8 : slabdata      1      1      0                                                                         
revoke_record          0      0     16  203    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
UNIX                 160    174    640    6    1 : tunables   54   27    8 : slabdata     29     29      0                                                                         
flow_cache             0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
cfq_ioc_pool           0      0     92   42    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
cfq_pool               0      0     96   40    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
mqueue_inode_cache      1      6    640    6    1 : tunables   54   27    8 : slabdata      1      1      0                                                                        
isofs_inode_cache      0      0    448    9    1 : tunables   54   27    8 : slabdata      0      0      0                                                                         
ext2_inode_cache       0      0    580    7    1 : tunables   54   27    8 : slabdata      0      0      0                                                                         
ext2_xattr             0      0     48   78    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
dnotify_cache          0      0     20  169    1 : tunables  120   60    8 : slabdata      0      0      0                                                                         
dquot                  0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                        
eventpoll_pwq          0      0     36  101    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                        
eventpoll_epi          0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                        
inotify_event_cache      0      0     28  127    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                      
inotify_watch_cache      1     92     40   92    1 : tunables  120   60    8 : slabdata      1      1      0                                                                                                                      
kioctx                 0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                        
kiocb                  0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                        
fasync_cache           0      0     16  203    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                        
shmem_inode_cache    187    203    536    7    1 : tunables   54   27    8 : slabdata     29     29      0                                                                                                                        
posix_timers_cache      0      0    100   39    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                       
uid_cache             23     59     64   59    1 : tunables  120   60    8 : slabdata      1      1      0                                                                                                                        
ip_mrt_cache           0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                        
UDP-Lite               0      0    640    6    1 : tunables   54   27    8 : slabdata      0      0      0                                                                                                                        
tcp_bind_bucket       13    203     16  203    1 : tunables  120   60    8 : slabdata      1      1      0                                                                                                                        
inet_peer_cache      113    295     64   59    1 : tunables  120   60    8 : slabdata      5      5      0                                                                                                                        
secpath_cache          0      0     32  113    1 : tunables  120   60    8 : slabdata      0      0      0                                                                                                                        
xfrm_dst_cache         0      0    384   10    1 : tunables   54   27    8 : slabdata      0      0      0                                                                                                                        
ip_dst_cache        1725   1725    256   15    1 : tunables  120   60    8 : slabdata    115    115      0                                                                                                                        
arp_cache             14     45    256   15    1 : tunables  120   60    8 : slabdata      3      3      0                                                                                                                        
RAW                    8     12    640    6    1 : tunables   54   27    8 : slabdata      2      2      0                                                                                                                        
UDP                   38     66    640    6    1 : tunables   54   27    8 : slabdata     11     11      0                                                                                                                        
tw_sock_TCP          141    165    256   15    1 : tunables  120   60    8 : slabdata     11     11      0                                                                                                                        
request_sock_TCP      27     30    128   30    1 : tunables  120   60    8 : slabdata      1      1      0                                                                                                                        
TCP                   96     96   1280    3    1 : tunables   24   12    8 : slabdata     32     32      0                                                                                                                        
blkdev_ioc             9    127     28  127    1 : tunables  120   60    8 : slabdata      1      1      0                                                                                                                        
blkdev_queue          19     20   1000    4    1 : tunables   54   27    8 : slabdata      5      5      0                                                                                                                        
blkdev_requests       61     63    184   21    1 : tunables  120   60    8 : slabdata      3      3      0                                                                                                                        
biovec-256             7      8   3072    2    2 : tunables   24   12    8 : slabdata      4      4      0                                                                                                                        
biovec-128             7     10   1536    5    2 : tunables   24   12    8 : slabdata      2      2      0                                                                                                                        
biovec-64              7     10    768    5    1 : tunables   54   27    8 : slabdata      2      2      0                                                                                                                        
biovec-16              7     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0                                                                                                                        
biovec-4               7     59     64   59    1 : tunables  120   60    8 : slabdata      1      1      0                                                                                                                        
biovec-1             147    203     16  203    1 : tunables  120   60    8 : slabdata      1      1     30                                                                                                                        
bio                  358    390    128   30    1 : tunables  120   60    8 : slabdata     13     13      0                                                                                                                        
sock_inode_cache     313    322    512    7    1 : tunables   54   27    8 : slabdata     46     46      0
skbuff_fclone_cache     66     80    384   10    1 : tunables   54   27    8 : slabdata      8      8     13
skbuff_head_cache    564    690    256   15    1 : tunables  120   60    8 : slabdata     46     46      0
xen-skb-65536          0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
xen-skb-32768          0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
xen-skb-16384          0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
xen-skb-8192           0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
xen-skb-4096           2      3   4096    1    1 : tunables   24   12    8 : slabdata      2      3      0
xen-skb-2048         452    464   2048    2    1 : tunables   24   12    8 : slabdata    232    232      0
xen-skb-512           61     88    512    8    1 : tunables   54   27    8 : slabdata     11     11      0
file_lock_cache       50     72    108   36    1 : tunables  120   60    8 : slabdata      2      2      0
delayacct_cache      135    252     60   63    1 : tunables  120   60    8 : slabdata      4      4      0
taskstats_cache        2     14    272   14    1 : tunables   54   27    8 : slabdata      1      1      0
proc_inode_cache     511    513    436    9    1 : tunables   54   27    8 : slabdata     57     57      0
sigqueue              52     54    144   27    1 : tunables  120   60    8 : slabdata      2      2      0
radix_tree_node     6445   6513    288   13    1 : tunables   54   27    8 : slabdata    501    501      0
bdev_cache            20     21    576    7    1 : tunables   54   27    8 : slabdata      3      3      0
sysfs_dir_cache     1939   2016     44   84    1 : tunables  120   60    8 : slabdata     24     24      0
mnt_cache             24     60    128   30    1 : tunables  120   60    8 : slabdata      2      2      0
inode_cache          699    756    420    9    1 : tunables   54   27    8 : slabdata     84     84      0
dentry_cache       30316  30402    144   27    1 : tunables  120   60    8 : slabdata   1126   1126      0
filp                1629   2380    192   20    1 : tunables  120   60    8 : slabdata    119    119    384
names_cache           10     10   4096    1    1 : tunables   24   12    8 : slabdata     10     10      0
avc_node             538    864     52   72    1 : tunables  120   60    8 : slabdata     12     12      0
selinux_inode_security  22354  23048     56   67    1 : tunables  120   60    8 : slabdata    344    344      0
key_jar               30     60    128   30    1 : tunables  120   60    8 : slabdata      2      2      0
idr_layer_cache      134    145    136   29    1 : tunables  120   60    8 : slabdata      5      5      0
buffer_head       112256 144050     56   67    1 : tunables  120   60    8 : slabdata   2150   2150     30
mm_struct             88     88    512    8    1 : tunables   54   27    8 : slabdata     11     11      0
vm_area_struct      2252   2552     88   44    1 : tunables  120   60    8 : slabdata     58     58    204
fs_cache              82    177     64   59    1 : tunables  120   60    8 : slabdata      3      3      0
files_cache           70    110    384   10    1 : tunables   54   27    8 : slabdata     11     11      0
signal_cache          93    126    448    9    1 : tunables   54   27    8 : slabdata     14     14      0
sighand_cache         93     93   1344    3    1 : tunables   24   12    8 : slabdata     31     31      0
task_struct          120    120   1424    5    2 : tunables   24   12    8 : slabdata     24     24      0
anon_vma            1100   1305     24  145    1 : tunables  120   60    8 : slabdata      9      9     24
pgd                   43     43   4096    1    1 : tunables   24   12    8 : slabdata     43     43      0
pmd                  172    172   4096    1    1 : tunables   24   12    8 : slabdata    172    172      0
pid                  136    303     36  101    1 : tunables  120   60    8 : slabdata      3      3      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             3      3  32768    1    8 : tunables    8    4    0 : slabdata      3      3      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             1      1  16384    1    4 : tunables    8    4    0 : slabdata      1      1      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192              6      6   8192    1    2 : tunables    8    4    0 : slabdata      6      6      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    8 : slabdata      0      0      0
size-4096            113    113   4096    1    1 : tunables   24   12    8 : slabdata    113    113      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    8 : slabdata      0      0      0
size-2048            112    112   2048    2    1 : tunables   24   12    8 : slabdata     56     56      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    8 : slabdata      0      0      0
size-1024            136    136   1024    4    1 : tunables   54   27    8 : slabdata     34     34      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    8 : slabdata      0      0      0
size-512             568    568    512    8    1 : tunables   54   27    8 : slabdata     71     71      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
size-256             329    345    256   15    1 : tunables  120   60    8 : slabdata     23     23      0
size-128(DMA)          0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
size-64(DMA)           0      0     64   59    1 : tunables  120   60    8 : slabdata      0      0      0
size-32(DMA)           0      0     32  113    1 : tunables  120   60    8 : slabdata      0      0      0
size-32            33938  35030     32  113    1 : tunables  120   60    8 : slabdata    310    310     24
size-128            3016   3060    128   30    1 : tunables  120   60    8 : slabdata    102    102      0
size-64             1972   2537     64   59    1 : tunables  120   60    8 : slabdata     43     43     22
kmem_cache           134    150    256   15    1 : tunables  120   60    8 : slabdata     10     10      0

 Active / Total Objects (% used)    : 309634 / 350925 (88.2%)
 Active / Total Slabs (% used)      : 10022 / 10022 (100.0%)
 Active / Total Caches (% used)     : 85 / 135 (63.0%)
 Active / Total Size (% used)       : 33927.17K / 36679.28K (92.5%)
 Minimum / Average / Maximum Object : 0.01K / 0.10K / 128.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
144050 112161  77%    0.05K   2150       67      8600K buffer_head
 61915  61745  99%    0.02K    305      203      1220K avtab_node
 35030  33723  96%    0.03K    310      113      1240K size-32
 30402  30368  99%    0.14K   1126       27      4504K dentry_cache
 23048  22346  96%    0.05K    344       67      1376K selinux_inode_security
 20652  20650  99%    0.58K   3442        6     13768K ext3_inode_cache
  6513   6445  98%    0.28K    501       13      2004K radix_tree_node
  3060   2933  95%    0.12K    102       30       408K size-128
  2552   2012  78%    0.09K     58       44       232K vm_area_struct
  2537   1942  76%    0.06K     43       59       172K size-64
  2418   1787  73%    0.05K     31       78       124K ext3_xattr
  2380   1188  49%    0.19K    119       20       476K filp
  2016   1939  96%    0.04K     24       84        96K sysfs_dir_cache
  1845   1824  98%    0.25K    123       15       492K ip_dst_cache
  1305    900  68%    0.02K      9      145        36K anon_vma
   864    520  60%    0.05K     12       72        48K avc_node
   756    683  90%    0.41K     84        9       336K inode_cache
   690    544  78%    0.25K     46       15       184K skbuff_head_cache
   686    621  90%    0.28K     49       14       196K nf_nat:help
   568    530  93%    0.50K     71        8       284K size-512
   513    497  96%    0.43K     57        9       228K proc_inode_cache
   464    446  96%    2.00K    232        2       928K xen-skb-2048
   360    322  89%    0.12K     12       30        48K bio
   360    162  45%    0.05K      5       72        20K journal_head
   345    282  81%    0.25K     23       15        92K size-256
   322    310  96%    0.50K     46        7       184K sock_inode_cache
   303     98  32%    0.04K      3      101        12K pid
   295    115  38%    0.06K      5       59        20K inet_peer_cache
   254      2   0%    0.01K      1      254         4K revoke_table
   252     98  38%    0.06K      4       63        16K delayacct_cache
   203     39  19%    0.02K      1      203         4K biovec-1
   203     13   6%    0.02K      1      203         4K tcp_bind_bucket
   203    175  86%    0.52K     29        7       116K shmem_inode_cache
   177     42  23%    0.06K      3       59        12K fs_cache
   174    147  84%    0.62K     29        6       116K UNIX
   169    118  69%    0.02K      1      169         4K journal_handle
   165    165 100%    0.25K     11       15        44K tw_sock_TCP
   150    134  89%    0.25K     10       15        40K kmem_cache
   146    146 100%    4.00K    146        1       584K pmd
   145    119  82%    0.13K      5       29        20K idr_layer_cache
   132     94  71%    1.00K     33        4       132K size-1024
   127      9   7%    0.03K      1      127         4K blkdev_ioc
   126     78  61%    0.44K     14        9        56K signal_cache
   115    106  92%    1.39K     23        5       184K task_struct
   113     14  12%    0.03K      1      113         4K ip_fib_hash
   113     14  12%    0.03K      1      113         4K ip_fib_alias
   113      7   6%    0.03K      1      113         4K fib6_nodes
   112    112 100%    2.00K     56        2       224K size-2048
   110     39  35%    0.38K     11       10        44K files_cache
    98     98 100%    4.00K     98        1       392K size-4096
    96     56  58%    0.50K     12        8        48K mm_struct
    96     96 100%    1.25K     32        3       128K TCP
    93     80  86%    1.31K     31        3       124K sighand_cache
    92      1   1%    0.04K      1       92         4K inotify_watch_cache
    88     57  64%    0.50K     11        8        44K xen-skb-512
    80     66  82%    0.38K      8       10        32K skbuff_fclone_cache
    72     72 100%    0.11K      2       36         8K file_lock_cache
    66     38  57%    0.62K     11        6        44K UDP
    60     17  28%    0.12K      2       30         8K key_jar

Comment 4 Shane 2009-09-30 06:56:15 UTC
Same symptoms here. We have several hosts running just one guest per host and it has happened on many (7 so far this week), the guests each had around 9 -10 days uptime, the host uptimes varied from 53 - 60 days.

/etc/init.d/xend reload made the guests responsive and the xm console command work again.

This is not new, it has been happening for a long time now, over many kernel updates.

Our hosts are currently 2.6.18-128.2.1.el5xen and the guests are 2.6.18-164.el5xen, but this same thing happened when both the host and guests were running matching kernels.

Here's the output of one the guests after running /etc/init.d/xend reload:

conntrack_ftp: partial 227  3149096871+13
BUG: soft lockup - CPU#0 stuck for 193s! [pure-ftpd:16657]
Pid: 16657, comm:            pure-ftpd
EIP: 0061:[<c0401227>] CPU: 0
EIP is at 0xc0401227
 EFLAGS: 00200246    Not tainted  (2.6.18-164.el5xen #1)
EAX: 00030001 EBX: 00000000 ECX: 00000000 EDX: f5416000
ESI: ec102eec EDI: c078704a EBP: 0000002d DS: 007b ES: 007b
CR0: 8005003b CR2: bfade434 CR3: 011dc000 CR4: 00000660
 [<c0554fa0>] force_evtchn_callback+0xa/0xc
 [<c041ffaa>] vprintk+0x2e5/0x2ef
 [<c0618f80>] _spin_lock_irqsave+0x8/0x28
 [<c0427500>] lock_timer_base+0x15/0x2f
 [<c0427611>] __mod_timer+0x99/0xa3
 [<ee28f628>] help+0x436/0x4aa [ip_conntrack_ftp]
 [<c041ffcd>] printk+0x19/0x9c
 [<ee28f628>] help+0x436/0x4aa [ip_conntrack_ftp]
 [<ee28f493>] help+0x2a1/0x4aa [ip_conntrack_ftp]
 [<ee28f1dd>] try_rfc959+0x0/0x15 [ip_conntrack_ftp]
 [<ee26608d>] ip_conntrack_help+0x27/0x34 [ip_conntrack]
 [<c05d2058>] nf_iterate+0x30/0x61
 [<c05db03f>] ip_finish_output+0x0/0x1db
 [<c05d217e>] nf_hook_slow+0x3a/0x90
 [<c05db03f>] ip_finish_output+0x0/0x1db
 [<c05dc310>] ip_output+0x82/0x266
 [<c05db03f>] ip_finish_output+0x0/0x1db
 [<c05dbc70>] ip_queue_xmit+0x3d0/0x40f
 [<c04701ae>] do_sync_write+0xb6/0xf1
 [<c042fef7>] autoremove_wake_function+0x0/0x2d
 [<c05e9abf>] tcp_transmit_skb+0x5c7/0x5f5
 [<c043ada1>] do_acct_process+0x5da/0x5ff
 [<c05eb3b2>] __tcp_push_pending_frames+0x685/0x74a
 [<c0618fc0>] _spin_lock_bh+0x8/0x18
 [<c04241b8>] local_bh_enable+0x5/0x81
 [<c05b4a53>] lock_sock+0x8e/0x96
 [<c05ec15c>] tcp_send_fin+0x131/0x139
 [<c05e1e39>] tcp_close+0x224/0x520
 [<c05f9383>] inet_release+0x43/0x48
 [<c05b2d70>] sock_release+0x11/0x86
 [<c05b2e0b>] sock_close+0x26/0x2a
 [<c04712b3>] __fput+0x9c/0x167
 [<c046ecd9>] filp_close+0x4e/0x54
 [<c0420fec>] put_files_struct+0x65/0xa7
 [<c0422296>] do_exit+0x26d/0x7a0
 [<c0422843>] sys_exit_group+0x0/0xd
 [<c0405413>] syscall_call+0x7/0xb
 =======================

Comment 6 Paolo Bonzini 2010-06-23 17:17:26 UTC
(Some) of the dumps include the OOM killer.  Given the relatively long host uptimes, I think this is a well-known memory leak in xenstore---some customers have seen it with libvirt after only 10-20 days of uptime.

*** This bug has been marked as a duplicate of bug 606919 ***

Comment 7 Chris Lalancette 2010-07-19 13:53:19 UTC
Clearing out old flags for reporting purposes.

Chris Lalancette