Bug 507545
Summary: | xen kernel -- soft lockup | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Ioannis Aslanidis <iaslanidis> |
Component: | kernel-xen | Assignee: | Rik van Riel <riel> |
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 5.2 | CC: | clalance, pbonzini, shane, xen-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-06-23 17:17:26 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 514491 |
Description
Ioannis Aslanidis
2009-06-23 09:31:42 UTC
The soft lockup looks like it could be just a side effect of running out of memory (and the kernel desperately looking for something to free): Active:32 inactive:2 dirty:0 writeback:0 unstable:0 free:686 slab:124322 mapped:2 pagetables:751 DMA free:2744kB min:2904kB low:3628kB high:4356kB active:128kB inactive:8kB present:528320kB pages_scanned:5408735 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 DMA: 0*4kB 5*8kB 1*16kB 0*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2744kB Swap cache: add 5217573, delete 5217579, find 40858241/41734223, race 82+1019 Free swap = 374856kB Total swap = 522104kB printk: 43749 messages suppressed. nrpe invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 As you can see, 497288 out of 528320 kB is taken up by the slab! It looks like something is leaking data structures into the slab. Could you give us a snapshot of /proc/slabinfo, or slabtop output, before a system gets into trouble? The out of memory part seems to have happened only to 1 of the 3 machines. In any case, I can give you a slab snapshot of what it looks like right now. I do not know how I could get a snapshot just before the problem. # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> ip_vs_conn 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 fib6_nodes 7 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0 ip6_dst_cache 7 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 ndisc_cache 1 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 RAWv6 7 10 768 5 1 : tunables 54 27 8 : slabdata 2 2 0 UDPLITEv6 0 0 768 5 1 : tunables 54 27 8 : slabdata 0 0 0 UDPv6 1 5 768 5 1 : tunables 54 27 8 : slabdata 1 1 0 tw_sock_TCPv6 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 request_sock_TCPv6 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 TCPv6 0 0 1408 5 2 : tunables 24 12 8 : slabdata 0 0 0 ip_fib_alias 14 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0 ip_fib_hash 14 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0 nf_nat:help 622 686 284 14 1 : tunables 54 27 8 : slabdata 49 49 0 nf_nat:base 0 0 252 15 1 : tunables 120 60 8 : slabdata 0 0 0 nf_conntrack_expect 0 0 148 26 1 : tunables 120 60 8 : slabdata 0 0 0 nf_conntrack:basic 0 0 220 18 1 : tunables 120 60 8 : slabdata 0 0 0 dm_tio 0 0 16 203 1 : tunables 120 60 8 : slabdata 0 0 0 dm_io 0 0 20 169 1 : tunables 120 60 8 : slabdata 0 0 0 avtab_node 61745 61915 16 203 1 : tunables 120 60 8 : slabdata 305 305 0 jbd_4k 11 11 4096 1 1 : tunables 24 12 8 : slabdata 11 11 0 ext3_inode_cache 20652 20652 596 6 1 : tunables 54 27 8 : slabdata 3442 3442 0 ext3_xattr 1787 2418 48 78 1 : tunables 120 60 8 : slabdata 31 31 0 journal_handle 84 169 20 169 1 : tunables 120 60 8 : slabdata 1 1 0 journal_head 324 432 52 72 1 : tunables 120 60 8 : slabdata 6 6 24 revoke_table 2 254 12 254 1 : tunables 120 60 8 : slabdata 1 1 0 revoke_record 0 0 16 203 1 : tunables 120 60 8 : slabdata 0 0 0 UNIX 160 174 640 6 1 : tunables 54 27 8 : slabdata 29 29 0 flow_cache 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 cfq_ioc_pool 0 0 92 42 1 : tunables 120 60 8 : slabdata 0 0 0 cfq_pool 0 0 96 40 1 : tunables 120 60 8 : slabdata 0 0 0 mqueue_inode_cache 1 6 640 6 1 : tunables 54 27 8 : slabdata 1 1 0 isofs_inode_cache 0 0 448 9 1 : tunables 54 27 8 : slabdata 0 0 0 ext2_inode_cache 0 0 580 7 1 : tunables 54 27 8 : slabdata 0 0 0 ext2_xattr 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 dnotify_cache 0 0 20 169 1 : tunables 120 60 8 : slabdata 0 0 0 dquot 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 eventpoll_pwq 0 0 36 101 1 : tunables 120 60 8 : slabdata 0 0 0 eventpoll_epi 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 inotify_event_cache 0 0 28 127 1 : tunables 120 60 8 : slabdata 0 0 0 inotify_watch_cache 1 92 40 92 1 : tunables 120 60 8 : slabdata 1 1 0 kioctx 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 kiocb 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 fasync_cache 0 0 16 203 1 : tunables 120 60 8 : slabdata 0 0 0 shmem_inode_cache 187 203 536 7 1 : tunables 54 27 8 : slabdata 29 29 0 posix_timers_cache 0 0 100 39 1 : tunables 120 60 8 : slabdata 0 0 0 uid_cache 23 59 64 59 1 : tunables 120 60 8 : slabdata 1 1 0 ip_mrt_cache 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 UDP-Lite 0 0 640 6 1 : tunables 54 27 8 : slabdata 0 0 0 tcp_bind_bucket 13 203 16 203 1 : tunables 120 60 8 : slabdata 1 1 0 inet_peer_cache 113 295 64 59 1 : tunables 120 60 8 : slabdata 5 5 0 secpath_cache 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0 xfrm_dst_cache 0 0 384 10 1 : tunables 54 27 8 : slabdata 0 0 0 ip_dst_cache 1725 1725 256 15 1 : tunables 120 60 8 : slabdata 115 115 0 arp_cache 14 45 256 15 1 : tunables 120 60 8 : slabdata 3 3 0 RAW 8 12 640 6 1 : tunables 54 27 8 : slabdata 2 2 0 UDP 38 66 640 6 1 : tunables 54 27 8 : slabdata 11 11 0 tw_sock_TCP 141 165 256 15 1 : tunables 120 60 8 : slabdata 11 11 0 request_sock_TCP 27 30 128 30 1 : tunables 120 60 8 : slabdata 1 1 0 TCP 96 96 1280 3 1 : tunables 24 12 8 : slabdata 32 32 0 blkdev_ioc 9 127 28 127 1 : tunables 120 60 8 : slabdata 1 1 0 blkdev_queue 19 20 1000 4 1 : tunables 54 27 8 : slabdata 5 5 0 blkdev_requests 61 63 184 21 1 : tunables 120 60 8 : slabdata 3 3 0 biovec-256 7 8 3072 2 2 : tunables 24 12 8 : slabdata 4 4 0 biovec-128 7 10 1536 5 2 : tunables 24 12 8 : slabdata 2 2 0 biovec-64 7 10 768 5 1 : tunables 54 27 8 : slabdata 2 2 0 biovec-16 7 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 biovec-4 7 59 64 59 1 : tunables 120 60 8 : slabdata 1 1 0 biovec-1 147 203 16 203 1 : tunables 120 60 8 : slabdata 1 1 30 bio 358 390 128 30 1 : tunables 120 60 8 : slabdata 13 13 0 sock_inode_cache 313 322 512 7 1 : tunables 54 27 8 : slabdata 46 46 0 skbuff_fclone_cache 66 80 384 10 1 : tunables 54 27 8 : slabdata 8 8 13 skbuff_head_cache 564 690 256 15 1 : tunables 120 60 8 : slabdata 46 46 0 xen-skb-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 xen-skb-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 xen-skb-16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 xen-skb-8192 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 xen-skb-4096 2 3 4096 1 1 : tunables 24 12 8 : slabdata 2 3 0 xen-skb-2048 452 464 2048 2 1 : tunables 24 12 8 : slabdata 232 232 0 xen-skb-512 61 88 512 8 1 : tunables 54 27 8 : slabdata 11 11 0 file_lock_cache 50 72 108 36 1 : tunables 120 60 8 : slabdata 2 2 0 delayacct_cache 135 252 60 63 1 : tunables 120 60 8 : slabdata 4 4 0 taskstats_cache 2 14 272 14 1 : tunables 54 27 8 : slabdata 1 1 0 proc_inode_cache 511 513 436 9 1 : tunables 54 27 8 : slabdata 57 57 0 sigqueue 52 54 144 27 1 : tunables 120 60 8 : slabdata 2 2 0 radix_tree_node 6445 6513 288 13 1 : tunables 54 27 8 : slabdata 501 501 0 bdev_cache 20 21 576 7 1 : tunables 54 27 8 : slabdata 3 3 0 sysfs_dir_cache 1939 2016 44 84 1 : tunables 120 60 8 : slabdata 24 24 0 mnt_cache 24 60 128 30 1 : tunables 120 60 8 : slabdata 2 2 0 inode_cache 699 756 420 9 1 : tunables 54 27 8 : slabdata 84 84 0 dentry_cache 30316 30402 144 27 1 : tunables 120 60 8 : slabdata 1126 1126 0 filp 1629 2380 192 20 1 : tunables 120 60 8 : slabdata 119 119 384 names_cache 10 10 4096 1 1 : tunables 24 12 8 : slabdata 10 10 0 avc_node 538 864 52 72 1 : tunables 120 60 8 : slabdata 12 12 0 selinux_inode_security 22354 23048 56 67 1 : tunables 120 60 8 : slabdata 344 344 0 key_jar 30 60 128 30 1 : tunables 120 60 8 : slabdata 2 2 0 idr_layer_cache 134 145 136 29 1 : tunables 120 60 8 : slabdata 5 5 0 buffer_head 112256 144050 56 67 1 : tunables 120 60 8 : slabdata 2150 2150 30 mm_struct 88 88 512 8 1 : tunables 54 27 8 : slabdata 11 11 0 vm_area_struct 2252 2552 88 44 1 : tunables 120 60 8 : slabdata 58 58 204 fs_cache 82 177 64 59 1 : tunables 120 60 8 : slabdata 3 3 0 files_cache 70 110 384 10 1 : tunables 54 27 8 : slabdata 11 11 0 signal_cache 93 126 448 9 1 : tunables 54 27 8 : slabdata 14 14 0 sighand_cache 93 93 1344 3 1 : tunables 24 12 8 : slabdata 31 31 0 task_struct 120 120 1424 5 2 : tunables 24 12 8 : slabdata 24 24 0 anon_vma 1100 1305 24 145 1 : tunables 120 60 8 : slabdata 9 9 24 pgd 43 43 4096 1 1 : tunables 24 12 8 : slabdata 43 43 0 pmd 172 172 4096 1 1 : tunables 24 12 8 : slabdata 172 172 0 pid 136 303 36 101 1 : tunables 120 60 8 : slabdata 3 3 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 3 3 32768 1 8 : tunables 8 4 0 : slabdata 3 3 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 1 1 16384 1 4 : tunables 8 4 0 : slabdata 1 1 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 6 6 8192 1 2 : tunables 8 4 0 : slabdata 6 6 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0 size-4096 113 113 4096 1 1 : tunables 24 12 8 : slabdata 113 113 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0 size-2048 112 112 2048 2 1 : tunables 24 12 8 : slabdata 56 56 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0 size-1024 136 136 1024 4 1 : tunables 54 27 8 : slabdata 34 34 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0 size-512 568 568 512 8 1 : tunables 54 27 8 : slabdata 71 71 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 size-256 329 345 256 15 1 : tunables 120 60 8 : slabdata 23 23 0 size-128(DMA) 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0 size-64(DMA) 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0 size-32(DMA) 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0 size-32 33938 35030 32 113 1 : tunables 120 60 8 : slabdata 310 310 24 size-128 3016 3060 128 30 1 : tunables 120 60 8 : slabdata 102 102 0 size-64 1972 2537 64 59 1 : tunables 120 60 8 : slabdata 43 43 22 kmem_cache 134 150 256 15 1 : tunables 120 60 8 : slabdata 10 10 0 Active / Total Objects (% used) : 309634 / 350925 (88.2%) Active / Total Slabs (% used) : 10022 / 10022 (100.0%) Active / Total Caches (% used) : 85 / 135 (63.0%) Active / Total Size (% used) : 33927.17K / 36679.28K (92.5%) Minimum / Average / Maximum Object : 0.01K / 0.10K / 128.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 144050 112161 77% 0.05K 2150 67 8600K buffer_head 61915 61745 99% 0.02K 305 203 1220K avtab_node 35030 33723 96% 0.03K 310 113 1240K size-32 30402 30368 99% 0.14K 1126 27 4504K dentry_cache 23048 22346 96% 0.05K 344 67 1376K selinux_inode_security 20652 20650 99% 0.58K 3442 6 13768K ext3_inode_cache 6513 6445 98% 0.28K 501 13 2004K radix_tree_node 3060 2933 95% 0.12K 102 30 408K size-128 2552 2012 78% 0.09K 58 44 232K vm_area_struct 2537 1942 76% 0.06K 43 59 172K size-64 2418 1787 73% 0.05K 31 78 124K ext3_xattr 2380 1188 49% 0.19K 119 20 476K filp 2016 1939 96% 0.04K 24 84 96K sysfs_dir_cache 1845 1824 98% 0.25K 123 15 492K ip_dst_cache 1305 900 68% 0.02K 9 145 36K anon_vma 864 520 60% 0.05K 12 72 48K avc_node 756 683 90% 0.41K 84 9 336K inode_cache 690 544 78% 0.25K 46 15 184K skbuff_head_cache 686 621 90% 0.28K 49 14 196K nf_nat:help 568 530 93% 0.50K 71 8 284K size-512 513 497 96% 0.43K 57 9 228K proc_inode_cache 464 446 96% 2.00K 232 2 928K xen-skb-2048 360 322 89% 0.12K 12 30 48K bio 360 162 45% 0.05K 5 72 20K journal_head 345 282 81% 0.25K 23 15 92K size-256 322 310 96% 0.50K 46 7 184K sock_inode_cache 303 98 32% 0.04K 3 101 12K pid 295 115 38% 0.06K 5 59 20K inet_peer_cache 254 2 0% 0.01K 1 254 4K revoke_table 252 98 38% 0.06K 4 63 16K delayacct_cache 203 39 19% 0.02K 1 203 4K biovec-1 203 13 6% 0.02K 1 203 4K tcp_bind_bucket 203 175 86% 0.52K 29 7 116K shmem_inode_cache 177 42 23% 0.06K 3 59 12K fs_cache 174 147 84% 0.62K 29 6 116K UNIX 169 118 69% 0.02K 1 169 4K journal_handle 165 165 100% 0.25K 11 15 44K tw_sock_TCP 150 134 89% 0.25K 10 15 40K kmem_cache 146 146 100% 4.00K 146 1 584K pmd 145 119 82% 0.13K 5 29 20K idr_layer_cache 132 94 71% 1.00K 33 4 132K size-1024 127 9 7% 0.03K 1 127 4K blkdev_ioc 126 78 61% 0.44K 14 9 56K signal_cache 115 106 92% 1.39K 23 5 184K task_struct 113 14 12% 0.03K 1 113 4K ip_fib_hash 113 14 12% 0.03K 1 113 4K ip_fib_alias 113 7 6% 0.03K 1 113 4K fib6_nodes 112 112 100% 2.00K 56 2 224K size-2048 110 39 35% 0.38K 11 10 44K files_cache 98 98 100% 4.00K 98 1 392K size-4096 96 56 58% 0.50K 12 8 48K mm_struct 96 96 100% 1.25K 32 3 128K TCP 93 80 86% 1.31K 31 3 124K sighand_cache 92 1 1% 0.04K 1 92 4K inotify_watch_cache 88 57 64% 0.50K 11 8 44K xen-skb-512 80 66 82% 0.38K 8 10 32K skbuff_fclone_cache 72 72 100% 0.11K 2 36 8K file_lock_cache 66 38 57% 0.62K 11 6 44K UDP 60 17 28% 0.12K 2 30 8K key_jar Same symptoms here. We have several hosts running just one guest per host and it has happened on many (7 so far this week), the guests each had around 9 -10 days uptime, the host uptimes varied from 53 - 60 days. /etc/init.d/xend reload made the guests responsive and the xm console command work again. This is not new, it has been happening for a long time now, over many kernel updates. Our hosts are currently 2.6.18-128.2.1.el5xen and the guests are 2.6.18-164.el5xen, but this same thing happened when both the host and guests were running matching kernels. Here's the output of one the guests after running /etc/init.d/xend reload: conntrack_ftp: partial 227 3149096871+13 BUG: soft lockup - CPU#0 stuck for 193s! [pure-ftpd:16657] Pid: 16657, comm: pure-ftpd EIP: 0061:[<c0401227>] CPU: 0 EIP is at 0xc0401227 EFLAGS: 00200246 Not tainted (2.6.18-164.el5xen #1) EAX: 00030001 EBX: 00000000 ECX: 00000000 EDX: f5416000 ESI: ec102eec EDI: c078704a EBP: 0000002d DS: 007b ES: 007b CR0: 8005003b CR2: bfade434 CR3: 011dc000 CR4: 00000660 [<c0554fa0>] force_evtchn_callback+0xa/0xc [<c041ffaa>] vprintk+0x2e5/0x2ef [<c0618f80>] _spin_lock_irqsave+0x8/0x28 [<c0427500>] lock_timer_base+0x15/0x2f [<c0427611>] __mod_timer+0x99/0xa3 [<ee28f628>] help+0x436/0x4aa [ip_conntrack_ftp] [<c041ffcd>] printk+0x19/0x9c [<ee28f628>] help+0x436/0x4aa [ip_conntrack_ftp] [<ee28f493>] help+0x2a1/0x4aa [ip_conntrack_ftp] [<ee28f1dd>] try_rfc959+0x0/0x15 [ip_conntrack_ftp] [<ee26608d>] ip_conntrack_help+0x27/0x34 [ip_conntrack] [<c05d2058>] nf_iterate+0x30/0x61 [<c05db03f>] ip_finish_output+0x0/0x1db [<c05d217e>] nf_hook_slow+0x3a/0x90 [<c05db03f>] ip_finish_output+0x0/0x1db [<c05dc310>] ip_output+0x82/0x266 [<c05db03f>] ip_finish_output+0x0/0x1db [<c05dbc70>] ip_queue_xmit+0x3d0/0x40f [<c04701ae>] do_sync_write+0xb6/0xf1 [<c042fef7>] autoremove_wake_function+0x0/0x2d [<c05e9abf>] tcp_transmit_skb+0x5c7/0x5f5 [<c043ada1>] do_acct_process+0x5da/0x5ff [<c05eb3b2>] __tcp_push_pending_frames+0x685/0x74a [<c0618fc0>] _spin_lock_bh+0x8/0x18 [<c04241b8>] local_bh_enable+0x5/0x81 [<c05b4a53>] lock_sock+0x8e/0x96 [<c05ec15c>] tcp_send_fin+0x131/0x139 [<c05e1e39>] tcp_close+0x224/0x520 [<c05f9383>] inet_release+0x43/0x48 [<c05b2d70>] sock_release+0x11/0x86 [<c05b2e0b>] sock_close+0x26/0x2a [<c04712b3>] __fput+0x9c/0x167 [<c046ecd9>] filp_close+0x4e/0x54 [<c0420fec>] put_files_struct+0x65/0xa7 [<c0422296>] do_exit+0x26d/0x7a0 [<c0422843>] sys_exit_group+0x0/0xd [<c0405413>] syscall_call+0x7/0xb ======================= (Some) of the dumps include the OOM killer. Given the relatively long host uptimes, I think this is a well-known memory leak in xenstore---some customers have seen it with libvirt after only 10-20 days of uptime. *** This bug has been marked as a duplicate of bug 606919 *** Clearing out old flags for reporting purposes. Chris Lalancette |