Bug 507545
| Summary: | xen kernel -- soft lockup | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Ioannis Aslanidis <iaslanidis> |
| Component: | kernel-xen | Assignee: | Rik van Riel <riel> |
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 5.2 | CC: | clalance, pbonzini, shane, xen-maint |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-06-23 17:17:26 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 514491 | ||
|
Description
Ioannis Aslanidis
2009-06-23 09:31:42 UTC
The soft lockup looks like it could be just a side effect of running out of memory (and the kernel desperately looking for something to free): Active:32 inactive:2 dirty:0 writeback:0 unstable:0 free:686 slab:124322 mapped:2 pagetables:751 DMA free:2744kB min:2904kB low:3628kB high:4356kB active:128kB inactive:8kB present:528320kB pages_scanned:5408735 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 DMA: 0*4kB 5*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2744kB Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019 Free swap = 374856kB Total swap = 522104kB printk: 43749 messages suppressed. nrpe invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 As you can see, 497288 out of 528320 kB is taken up by the slab! It looks like something is leaking data structures into the slab. Could you give us a snapshot of /proc/slabinfo, or slabtop output, before a system gets into trouble? The out of memory part seems to have happened only to 1 of the 3 machines. In any case, I can give you a slab snapshot of what it looks like right now. I do not know how I could get a snapshot just before the problem.
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
ip_vs_conn 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
fib6_nodes 7 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0
ip6_dst_cache 7 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0
ndisc_cache 1 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0
RAWv6 7 10 768 5 1 : tunables 54 27 8 : slabdata 2 2 0
UDPLITEv6 0 0 768 5 1 : tunables 54 27 8 : slabdata 0 0 0
UDPv6 1 5 768 5 1 : tunables 54 27 8 : slabdata 1 1 0
tw_sock_TCPv6 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
request_sock_TCPv6 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0
TCPv6 0 0 1408 5 2 : tunables 24 12 8 : slabdata 0 0 0
ip_fib_alias 14 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0
ip_fib_hash 14 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0
nf_nat:help 622 686 284 14 1 : tunables 54 27 8 : slabdata 49 49 0
nf_nat:base 0 0 252 15 1 : tunables 120 60 8 : slabdata 0 0 0
nf_conntrack_expect 0 0 148 26 1 : tunables 120 60 8 : slabdata 0 0 0
nf_conntrack:basic 0 0 220 18 1 : tunables 120 60 8 : slabdata 0 0 0
dm_tio 0 0 16 203 1 : tunables 120 60 8 : slabdata 0 0 0
dm_io 0 0 20 169 1 : tunables 120 60 8 : slabdata 0 0 0
avtab_node 61745 61915 16 203 1 : tunables 120 60 8 : slabdata 305 305 0
jbd_4k 11 11 4096 1 1 : tunables 24 12 8 : slabdata 11 11 0
ext3_inode_cache 20652 20652 596 6 1 : tunables 54 27 8 : slabdata 3442 3442 0
ext3_xattr 1787 2418 48 78 1 : tunables 120 60 8 : slabdata 31 31 0
journal_handle 84 169 20 169 1 : tunables 120 60 8 : slabdata 1 1 0
journal_head 324 432 52 72 1 : tunables 120 60 8 : slabdata 6 6 24
revoke_table 2 254 12 254 1 : tunables 120 60 8 : slabdata 1 1 0
revoke_record 0 0 16 203 1 : tunables 120 60 8 : slabdata 0 0 0
UNIX 160 174 640 6 1 : tunables 54 27 8 : slabdata 29 29 0
flow_cache 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0
cfq_ioc_pool 0 0 92 42 1 : tunables 120 60 8 : slabdata 0 0 0
cfq_pool 0 0 96 40 1 : tunables 120 60 8 : slabdata 0 0 0
mqueue_inode_cache 1 6 640 6 1 : tunables 54 27 8 : slabdata 1 1 0
isofs_inode_cache 0 0 448 9 1 : tunables 54 27 8 : slabdata 0 0 0
ext2_inode_cache 0 0 580 7 1 : tunables 54 27 8 : slabdata 0 0 0
ext2_xattr 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0
dnotify_cache 0 0 20 169 1 : tunables 120 60 8 : slabdata 0 0 0
dquot 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
eventpoll_pwq 0 0 36 101 1 : tunables 120 60 8 : slabdata 0 0 0
eventpoll_epi 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0
inotify_event_cache 0 0 28 127 1 : tunables 120 60 8 : slabdata 0 0 0
inotify_watch_cache 1 92 40 92 1 : tunables 120 60 8 : slabdata 1 1 0
kioctx 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
kiocb 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
fasync_cache 0 0 16 203 1 : tunables 120 60 8 : slabdata 0 0 0
shmem_inode_cache 187 203 536 7 1 : tunables 54 27 8 : slabdata 29 29 0
posix_timers_cache 0 0 100 39 1 : tunables 120 60 8 : slabdata 0 0 0
uid_cache 23 59 64 59 1 : tunables 120 60 8 : slabdata 1 1 0
ip_mrt_cache 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0
UDP-Lite 0 0 640 6 1 : tunables 54 27 8 : slabdata 0 0 0
tcp_bind_bucket 13 203 16 203 1 : tunables 120 60 8 : slabdata 1 1 0
inet_peer_cache 113 295 64 59 1 : tunables 120 60 8 : slabdata 5 5 0
secpath_cache 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0
xfrm_dst_cache 0 0 384 10 1 : tunables 54 27 8 : slabdata 0 0 0
ip_dst_cache 1725 1725 256 15 1 : tunables 120 60 8 : slabdata 115 115 0
arp_cache 14 45 256 15 1 : tunables 120 60 8 : slabdata 3 3 0
RAW 8 12 640 6 1 : tunables 54 27 8 : slabdata 2 2 0
UDP 38 66 640 6 1 : tunables 54 27 8 : slabdata 11 11 0
tw_sock_TCP 141 165 256 15 1 : tunables 120 60 8 : slabdata 11 11 0
request_sock_TCP 27 30 128 30 1 : tunables 120 60 8 : slabdata 1 1 0
TCP 96 96 1280 3 1 : tunables 24 12 8 : slabdata 32 32 0
blkdev_ioc 9 127 28 127 1 : tunables 120 60 8 : slabdata 1 1 0
blkdev_queue 19 20 1000 4 1 : tunables 54 27 8 : slabdata 5 5 0
blkdev_requests 61 63 184 21 1 : tunables 120 60 8 : slabdata 3 3 0
biovec-256 7 8 3072 2 2 : tunables 24 12 8 : slabdata 4 4 0
biovec-128 7 10 1536 5 2 : tunables 24 12 8 : slabdata 2 2 0
biovec-64 7 10 768 5 1 : tunables 54 27 8 : slabdata 2 2 0
biovec-16 7 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0
biovec-4 7 59 64 59 1 : tunables 120 60 8 : slabdata 1 1 0
biovec-1 147 203 16 203 1 : tunables 120 60 8 : slabdata 1 1 30
bio 358 390 128 30 1 : tunables 120 60 8 : slabdata 13 13 0
sock_inode_cache 313 322 512 7 1 : tunables 54 27 8 : slabdata 46 46 0
skbuff_fclone_cache 66 80 384 10 1 : tunables 54 27 8 : slabdata 8 8 13
skbuff_head_cache 564 690 256 15 1 : tunables 120 60 8 : slabdata 46 46 0
xen-skb-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
xen-skb-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
xen-skb-16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
xen-skb-8192 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
xen-skb-4096 2 3 4096 1 1 : tunables 24 12 8 : slabdata 2 3 0
xen-skb-2048 452 464 2048 2 1 : tunables 24 12 8 : slabdata 232 232 0
xen-skb-512 61 88 512 8 1 : tunables 54 27 8 : slabdata 11 11 0
file_lock_cache 50 72 108 36 1 : tunables 120 60 8 : slabdata 2 2 0
delayacct_cache 135 252 60 63 1 : tunables 120 60 8 : slabdata 4 4 0
taskstats_cache 2 14 272 14 1 : tunables 54 27 8 : slabdata 1 1 0
proc_inode_cache 511 513 436 9 1 : tunables 54 27 8 : slabdata 57 57 0
sigqueue 52 54 144 27 1 : tunables 120 60 8 : slabdata 2 2 0
radix_tree_node 6445 6513 288 13 1 : tunables 54 27 8 : slabdata 501 501 0
bdev_cache 20 21 576 7 1 : tunables 54 27 8 : slabdata 3 3 0
sysfs_dir_cache 1939 2016 44 84 1 : tunables 120 60 8 : slabdata 24 24 0
mnt_cache 24 60 128 30 1 : tunables 120 60 8 : slabdata 2 2 0
inode_cache 699 756 420 9 1 : tunables 54 27 8 : slabdata 84 84 0
dentry_cache 30316 30402 144 27 1 : tunables 120 60 8 : slabdata 1126 1126 0
filp 1629 2380 192 20 1 : tunables 120 60 8 : slabdata 119 119 384
names_cache 10 10 4096 1 1 : tunables 24 12 8 : slabdata 10 10 0
avc_node 538 864 52 72 1 : tunables 120 60 8 : slabdata 12 12 0
selinux_inode_security 22354 23048 56 67 1 : tunables 120 60 8 : slabdata 344 344 0
key_jar 30 60 128 30 1 : tunables 120 60 8 : slabdata 2 2 0
idr_layer_cache 134 145 136 29 1 : tunables 120 60 8 : slabdata 5 5 0
buffer_head 112256 144050 56 67 1 : tunables 120 60 8 : slabdata 2150 2150 30
mm_struct 88 88 512 8 1 : tunables 54 27 8 : slabdata 11 11 0
vm_area_struct 2252 2552 88 44 1 : tunables 120 60 8 : slabdata 58 58 204
fs_cache 82 177 64 59 1 : tunables 120 60 8 : slabdata 3 3 0
files_cache 70 110 384 10 1 : tunables 54 27 8 : slabdata 11 11 0
signal_cache 93 126 448 9 1 : tunables 54 27 8 : slabdata 14 14 0
sighand_cache 93 93 1344 3 1 : tunables 24 12 8 : slabdata 31 31 0
task_struct 120 120 1424 5 2 : tunables 24 12 8 : slabdata 24 24 0
anon_vma 1100 1305 24 145 1 : tunables 120 60 8 : slabdata 9 9 24
pgd 43 43 4096 1 1 : tunables 24 12 8 : slabdata 43 43 0
pmd 172 172 4096 1 1 : tunables 24 12 8 : slabdata 172 172 0
pid 136 303 36 101 1 : tunables 120 60 8 : slabdata 3 3 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 3 3 32768 1 8 : tunables 8 4 0 : slabdata 3 3 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 1 1 16384 1 4 : tunables 8 4 0 : slabdata 1 1 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 6 6 8192 1 2 : tunables 8 4 0 : slabdata 6 6 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0
size-4096 113 113 4096 1 1 : tunables 24 12 8 : slabdata 113 113 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0
size-2048 112 112 2048 2 1 : tunables 24 12 8 : slabdata 56 56 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
size-1024 136 136 1024 4 1 : tunables 54 27 8 : slabdata 34 34 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0
size-512 568 568 512 8 1 : tunables 54 27 8 : slabdata 71 71 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
size-256 329 345 256 15 1 : tunables 120 60 8 : slabdata 23 23 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0
size-32(DMA) 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0
size-32 33938 35030 32 113 1 : tunables 120 60 8 : slabdata 310 310 24
size-128 3016 3060 128 30 1 : tunables 120 60 8 : slabdata 102 102 0
size-64 1972 2537 64 59 1 : tunables 120 60 8 : slabdata 43 43 22
kmem_cache 134 150 256 15 1 : tunables 120 60 8 : slabdata 10 10 0
Active / Total Objects (% used) : 309634 / 350925 (88.2%)
Active / Total Slabs (% used) : 10022 / 10022 (100.0%)
Active / Total Caches (% used) : 85 / 135 (63.0%)
Active / Total Size (% used) : 33927.17K / 36679.28K (92.5%)
Minimum / Average / Maximum Object : 0.01K / 0.10K / 128.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
144050 112161 77% 0.05K 2150 67 8600K buffer_head
61915 61745 99% 0.02K 305 203 1220K avtab_node
35030 33723 96% 0.03K 310 113 1240K size-32
30402 30368 99% 0.14K 1126 27 4504K dentry_cache
23048 22346 96% 0.05K 344 67 1376K selinux_inode_security
20652 20650 99% 0.58K 3442 6 13768K ext3_inode_cache
6513 6445 98% 0.28K 501 13 2004K radix_tree_node
3060 2933 95% 0.12K 102 30 408K size-128
2552 2012 78% 0.09K 58 44 232K vm_area_struct
2537 1942 76% 0.06K 43 59 172K size-64
2418 1787 73% 0.05K 31 78 124K ext3_xattr
2380 1188 49% 0.19K 119 20 476K filp
2016 1939 96% 0.04K 24 84 96K sysfs_dir_cache
1845 1824 98% 0.25K 123 15 492K ip_dst_cache
1305 900 68% 0.02K 9 145 36K anon_vma
864 520 60% 0.05K 12 72 48K avc_node
756 683 90% 0.41K 84 9 336K inode_cache
690 544 78% 0.25K 46 15 184K skbuff_head_cache
686 621 90% 0.28K 49 14 196K nf_nat:help
568 530 93% 0.50K 71 8 284K size-512
513 497 96% 0.43K 57 9 228K proc_inode_cache
464 446 96% 2.00K 232 2 928K xen-skb-2048
360 322 89% 0.12K 12 30 48K bio
360 162 45% 0.05K 5 72 20K journal_head
345 282 81% 0.25K 23 15 92K size-256
322 310 96% 0.50K 46 7 184K sock_inode_cache
303 98 32% 0.04K 3 101 12K pid
295 115 38% 0.06K 5 59 20K inet_peer_cache
254 2 0% 0.01K 1 254 4K revoke_table
252 98 38% 0.06K 4 63 16K delayacct_cache
203 39 19% 0.02K 1 203 4K biovec-1
203 13 6% 0.02K 1 203 4K tcp_bind_bucket
203 175 86% 0.52K 29 7 116K shmem_inode_cache
177 42 23% 0.06K 3 59 12K fs_cache
174 147 84% 0.62K 29 6 116K UNIX
169 118 69% 0.02K 1 169 4K journal_handle
165 165 100% 0.25K 11 15 44K tw_sock_TCP
150 134 89% 0.25K 10 15 40K kmem_cache
146 146 100% 4.00K 146 1 584K pmd
145 119 82% 0.13K 5 29 20K idr_layer_cache
132 94 71% 1.00K 33 4 132K size-1024
127 9 7% 0.03K 1 127 4K blkdev_ioc
126 78 61% 0.44K 14 9 56K signal_cache
115 106 92% 1.39K 23 5 184K task_struct
113 14 12% 0.03K 1 113 4K ip_fib_hash
113 14 12% 0.03K 1 113 4K ip_fib_alias
113 7 6% 0.03K 1 113 4K fib6_nodes
112 112 100% 2.00K 56 2 224K size-2048
110 39 35% 0.38K 11 10 44K files_cache
98 98 100% 4.00K 98 1 392K size-4096
96 56 58% 0.50K 12 8 48K mm_struct
96 96 100% 1.25K 32 3 128K TCP
93 80 86% 1.31K 31 3 124K sighand_cache
92 1 1% 0.04K 1 92 4K inotify_watch_cache
88 57 64% 0.50K 11 8 44K xen-skb-512
80 66 82% 0.38K 8 10 32K skbuff_fclone_cache
72 72 100% 0.11K 2 36 8K file_lock_cache
66 38 57% 0.62K 11 6 44K UDP
60 17 28% 0.12K 2 30 8K key_jar
Same symptoms here. We have several hosts running just one guest per host and it has happened on many (7 so far this week), the guests each had around 9 -10 days uptime, the host uptimes varied from 53 - 60 days. /etc/init.d/xend reload made the guests responsive and the xm console command work again. This is not new, it has been happening for a long time now, over many kernel updates. Our hosts are currently 2.6.18-128.2.1.el5xen and the guests are 2.6.18-164.el5xen, but this same thing happened when both the host and guests were running matching kernels. Here's the output of one the guests after running /etc/init.d/xend reload: conntrack_ftp: partial 227 3149096871+13 BUG: soft lockup - CPU#0 stuck for 193s! [pure-ftpd:16657] Pid: 16657, comm: pure-ftpd EIP: 0061:[<c0401227>] CPU: 0 EIP is at 0xc0401227 EFLAGS: 00200246 Not tainted (2.6.18-164.el5xen #1) EAX: 00030001 EBX: 00000000 ECX: 00000000 EDX: f5416000 ESI: ec102eec EDI: c078704a EBP: 0000002d DS: 007b ES: 007b CR0: 8005003b CR2: bfade434 CR3: 011dc000 CR4: 00000660 [<c0554fa0>] force_evtchn_callback+0xa/0xc [<c041ffaa>] vprintk+0x2e5/0x2ef [<c0618f80>] _spin_lock_irqsave+0x8/0x28 [<c0427500>] lock_timer_base+0x15/0x2f [<c0427611>] __mod_timer+0x99/0xa3 [<ee28f628>] help+0x436/0x4aa [ip_conntrack_ftp] [<c041ffcd>] printk+0x19/0x9c [<ee28f628>] help+0x436/0x4aa [ip_conntrack_ftp] [<ee28f493>] help+0x2a1/0x4aa [ip_conntrack_ftp] [<ee28f1dd>] try_rfc959+0x0/0x15 [ip_conntrack_ftp] [<ee26608d>] ip_conntrack_help+0x27/0x34 [ip_conntrack] [<c05d2058>] nf_iterate+0x30/0x61 [<c05db03f>] ip_finish_output+0x0/0x1db [<c05d217e>] nf_hook_slow+0x3a/0x90 [<c05db03f>] ip_finish_output+0x0/0x1db [<c05dc310>] ip_output+0x82/0x266 [<c05db03f>] ip_finish_output+0x0/0x1db [<c05dbc70>] ip_queue_xmit+0x3d0/0x40f [<c04701ae>] do_sync_write+0xb6/0xf1 [<c042fef7>] autoremove_wake_function+0x0/0x2d [<c05e9abf>] tcp_transmit_skb+0x5c7/0x5f5 [<c043ada1>] do_acct_process+0x5da/0x5ff [<c05eb3b2>] __tcp_push_pending_frames+0x685/0x74a [<c0618fc0>] _spin_lock_bh+0x8/0x18 [<c04241b8>] local_bh_enable+0x5/0x81 [<c05b4a53>] lock_sock+0x8e/0x96 [<c05ec15c>] tcp_send_fin+0x131/0x139 [<c05e1e39>] tcp_close+0x224/0x520 [<c05f9383>] inet_release+0x43/0x48 [<c05b2d70>] sock_release+0x11/0x86 [<c05b2e0b>] sock_close+0x26/0x2a [<c04712b3>] __fput+0x9c/0x167 [<c046ecd9>] filp_close+0x4e/0x54 [<c0420fec>] put_files_struct+0x65/0xa7 [<c0422296>] do_exit+0x26d/0x7a0 [<c0422843>] sys_exit_group+0x0/0xd [<c0405413>] syscall_call+0x7/0xb ======================= (Some) of the dumps include the OOM killer. Given the relatively long host uptimes, I think this is a well-known memory leak in xenstore---some customers have seen it with libvirt after only 10-20 days of uptime. *** This bug has been marked as a duplicate of bug 606919 *** Clearing out old flags for reporting purposes. Chris Lalancette |