Red Hat Bugzilla – Bug 910104
OOM issues when running many raid -> thin pool conversions w/ lvmetad running
Last modified: 2013-11-21 18:20:31 EST
Description of problem: ./snapper_thinp -e raid_to_pool_conversion ============================================================ Iteration 2549 of 10000 started at Sat Feb 9 23:14:51 CST 2013 ============================================================ SCENARIO - [raid_to_pool_conversion] Create raid volumes and convert them to pool and pool meta volumes lvcreate --type raid1 -m 1 -L 100M -n to_pool_convert snapper_thinp WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. No input from event server. snapper_thinp-to_pool_convert: event registration failed: No such file or directory snapper_thinp/to_pool_convert: mirror segment monitoring function failed. lvcreate --type raid1 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. No input from event server. snapper_thinp-to_pool_meta_convert: event registration failed: Input/output error snapper_thinp/to_pool_meta_convert: mirror segment monitoring function failed. lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. No input from event server. snapper_thinp-to_pool_convert_tdata: event registration failed: Input/output error snapper_thinp/to_pool_convert_tdata: mirror segment monitoring function failed. Failed to monitor to_pool_convert_tdata No input from event server. No input from event server. snapper_thinp-to_pool_convert_tmeta: event registration failed: Input/output error snapper_thinp/to_pool_convert_tmeta: mirror segment monitoring function failed. No input from event server. No input from event server. snapper_thinp-to_pool_convert-tpool: event registration failed: Input/output error snapper_thinp/to_pool_convert: thin-pool segment monitoring function failed. lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n origin WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. No input from event server. No input from event server. snapper_thinp-to_pool_convert_tdata: event registration failed: Input/output error snapper_thinp/to_pool_convert_tdata: mirror segment monitoring function failed. Failed to monitor to_pool_convert_tdata No input from event server. No input from event server. snapper_thinp-to_pool_convert_tmeta: event registration failed: Input/output error snapper_thinp/to_pool_convert_tmeta: mirror segment monitoring function failed. Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 Success Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1 Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1 Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1 Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1 Making snapshot of origin volume WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. lvcreate -s /dev/snapper_thinp/origin -n snap_of_pool_convert WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. Removing volume snapper_thinp/snap_of_pool_convert Removing thin origin and other virtual thin volumes WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. Removing thinpool snapper_thinp/to_pool_convert WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. qarshd[18260]: Running cmdline: lvcreate -s /dev/snapper_thinp/origin -n snap_of_pool_convert kernel: lvcreate invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0 kernel: lvcreate cpuset=/ mems_allowed=0 kernel: Pid: 18261, comm: lvcreate Not tainted 2.6.32-354.el6.x86_64 #1 kernel: Call Trace: kernel: [<ffffffff810cb5d1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 kernel: [<ffffffff8111cd10>] ? dump_header+0x90/0x1b0 kernel: [<ffffffff810cc671>] ? cpuset_mems_allowed_intersects+0x21/0x30 kernel: [<ffffffff8111d192>] ? oom_kill_process+0x82/0x2a0 kernel: [<ffffffff8111d0d1>] ? select_bad_process+0xe1/0x120 kernel: [<ffffffff8111d5d0>] ? out_of_memory+0x220/0x3c0 kernel: [<ffffffff8112c27c>] ? __alloc_pages_nodemask+0x8ac/0x8d0 kernel: [<ffffffff8116088a>] ? alloc_pages_vma+0x9a/0x150 kernel: [<ffffffff81143cfb>] ? handle_pte_fault+0x76b/0xb50 kernel: [<ffffffff8104baa7>] ? pte_alloc_one+0x37/0x50 kernel: [<ffffffff8117b379>] ? do_huge_pmd_anonymous_page+0xb9/0x380 kernel: [<ffffffff8114431a>] ? handle_mm_fault+0x23a/0x310 kernel: [<ffffffff810474c9>] ? __do_page_fault+0x139/0x480 kernel: [<ffffffff811488ea>] ? vma_merge+0x29a/0x3e0 kernel: [<ffffffff81149cac>] ? do_brk+0x26c/0x350 kernel: [<ffffffff81512c6e>] ? do_page_fault+0x3e/0xa0 kernel: [<ffffffff81510025>] ? page_fault+0x25/0x30 kernel: Mem-Info: kernel: Node 0 DMA per-cpu: kernel: CPU 0: hi: 0, btch: 1 usd: 0 kernel: CPU 1: hi: 0, btch: 1 usd: 0 kernel: Node 0 DMA32 per-cpu: kernel: CPU 0: hi: 186, btch: 31 usd: 37 kernel: CPU 1: hi: 186, btch: 31 usd: 0 kernel: active_anon:22125 inactive_anon:22556 isolated_anon:0 kernel: active_file:14 inactive_file:14 isolated_file:0 kernel: unevictable:209319 dirty:0 writeback:4 unstable:0 kernel: free:13246 slab_reclaimable:2524 slab_unreclaimable:99611 kernel: mapped:901 shmem:57 pagetables:3627 bounce:0 kernel: Node 0 DMA free:8340kB min:332kB low:412kB high:496kB active_anon:2296kB inactive_anon:3828kB active_file:20kB inactive_file:0kB unevictable:580kB isolated(anon):0kB isolated(file):0kB present:15252kB mlocked:580kB dirty:0kB writeback:0kB mapped:36kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:140kB kernel_stack:8kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no kernel: lowmem_reserve[]: 0 2004 2004 2004 kernel: Node 0 DMA32 free:44644kB min:44720kB low:55900kB high:67080kB active_anon:86204kB inactive_anon:86396kB active_file:36kB inactive_file:56kB unevictable:836696kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:765156kB dirty:0kB writeback:16kB mapped:3568kB shmem:228kB slab_reclaimable:10096kB slab_unreclaimable:398304kB kernel_stack:1792kB pagetables:14484kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:179 all_unreclaimable? yes kernel: lowmem_reserve[]: 0 0 0 0 kernel: Node 0 DMA: 7*4kB 34*8kB 4*16kB 4*32kB 5*64kB 5*128kB 1*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 8364kB kernel: Node 0 DMA32: 289*4kB 122*8kB 841*16kB 404*32kB 76*64kB 20*128kB 10*256kB 4*512kB 2*1024kB 1*2048kB 0*4096kB = 44644kB kernel: 3651 total pagecache pages kernel: 2691 pages in swap cache kernel: Swap cache stats: add 5761846, delete 5759155, find 15220058/15731145 kernel: Free swap = 0kB kernel: Total swap = 4128760kB kernel: 524284 pages RAM kernel: 43654 pages reserved kernel: 3800 pages shared kernel: 462483 pages non-shared kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name kernel: [ 610] 0 610 2765 150 0 -17 -1000 udevd kernel: [ 6719] 0 6719 454812 209418 0 -17 -1000 dmeventd kernel: [ 7181] 0 7181 2279 1 0 0 0 dhclient kernel: [ 7227] 0 7227 23293 32 1 -17 -1000 auditd kernel: [ 7243] 0 7243 64047 828 1 0 0 rsyslogd kernel: [ 7270] 0 7270 2704 73 1 0 0 irqbalance kernel: [ 7284] 32 7284 4743 17 0 0 0 rpcbind kernel: [ 7302] 29 7302 5836 1 0 0 0 rpc.statd kernel: [ 7335] 0 7335 6290 1 0 0 0 rpc.idmapd kernel: [ 7428] 81 7428 7942 1 1 0 0 dbus-daemon kernel: [ 7458] 0 7458 1019 0 1 0 0 acpid kernel: [ 7467] 68 7467 6513 227 0 0 0 hald kernel: [ 7468] 0 7468 4526 1 1 0 0 hald-runner kernel: [ 7496] 0 7496 5055 1 1 0 0 hald-addon-inpu kernel: [ 7511] 68 7511 4451 1 1 0 0 hald-addon-acpi kernel: [ 7528] 0 7528 16029 0 1 -17 -1000 sshd kernel: [ 7536] 0 7536 5523 86 0 0 0 xinetd kernel: [ 7612] 0 7612 19677 41 1 0 0 master kernel: [ 7621] 89 7621 19740 35 0 0 0 qmgr kernel: [ 7636] 0 7636 27545 1 1 0 0 abrtd kernel: [ 7644] 0 7644 29302 70 1 0 0 crond kernel: [ 7655] 0 7655 5363 5 0 0 0 atd kernel: [ 7668] 0 7668 25972 1 0 0 0 rhsmcertd kernel: [ 7690] 0 7690 1015 1 0 0 0 mingetty kernel: [ 7692] 0 7692 1015 1 0 0 0 mingetty kernel: [ 7694] 0 7694 1015 1 0 0 0 mingetty kernel: [ 7696] 0 7696 1015 1 0 0 0 mingetty kernel: [ 7698] 0 7698 19275 1 1 0 0 login kernel: [ 7699] 0 7699 1015 1 0 0 0 mingetty kernel: [ 7701] 0 7701 1015 1 0 0 0 mingetty kernel: [ 7710] 0 7710 24466 1 0 0 0 sshd kernel: [ 7720] 0 7720 258417 1 1 0 0 console-kit-dae kernel: [ 7786] 0 7786 27083 1 1 0 0 bash kernel: [ 7800] 0 7800 25234 1 0 0 0 tail kernel: [ 7803] 0 7803 27084 1 0 0 0 bash kernel: [ 9054] 0 9054 1309944 32103 1 0 0 lvmetad kernel: [ 9057] 0 9057 1012 9 1 0 0 btimed kernel: [14565] 89 14565 19697 161 1 0 0 pickup kernel: [18212] 0 18212 2832 362 0 -17 -1000 udevd kernel: [18225] 0 18225 2832 328 1 -17 -1000 udevd kernel: [18260] 0 18260 4091 204 1 0 0 qarshd kernel: [18261] 0 18261 37967 8040 1 0 0 lvcreate kernel: [18263] 0 18263 2766 257 0 -17 -1000 udevd kernel: [18264] 0 18264 2766 257 1 -17 -1000 udevd kernel: [18265] 0 18265 2766 257 1 -17 -1000 udevd kernel: [18266] 0 18266 2766 257 1 -17 -1000 udevd kernel: [18267] 0 18267 2766 257 0 -17 -1000 udevd kernel: [18268] 0 18268 2766 257 0 -17 -1000 udevd kernel: [18277] 0 18277 2764 251 0 -17 -1000 udevd kernel: [18278] 0 18278 2766 257 0 -17 -1000 udevd kernel: Out of memory: Kill process 9054 (lvmetad) score 672 or sacrifice child kernel: Killed process 9054, UID 0, (lvmetad) total-vm:5239776kB, anon-rss:128120kB, file-rss:292kB Version-Release number of selected component (if applicable): 2.6.32-354.el6.x86_64 lvm2-2.02.98-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 lvm2-libs-2.02.98-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 lvm2-cluster-2.02.98-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 udev-147-2.43.el6 BUILT: Thu Oct 11 05:59:38 CDT 2012 device-mapper-1.02.77-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 device-mapper-libs-1.02.77-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 device-mapper-event-1.02.77-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 device-mapper-event-libs-1.02.77-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 cmirror-2.02.98-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013
dmeventd memory size looks like its leaking memory.
Also be aware: Bug 956769 - MD RAID1/10 are leaking memory when they are stopped Although, it would take quite a few iterations for this to become a problem...
Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
This report seems to fall inbetween: commit 15fdd5c90dda7f00f691668c13d5401206d22021 Date: Wed Jan 16 11:09:37 2013 +0100 and commit 95372a852bbcacc9f194324832b94fcf1493f7c5 Date: Wed Apr 3 13:46:12 2013 +0200 lvmetad: Fix a memory leak introduced in 15fdd5c90dd. so I believe this was fixed in 95372a852bbcacc9f194324832b94fcf1493f7c5. Either way, I can't reproduce the problem, and if there was a leak in lvmetad, we'd be still seeing many failures like this one.
So please re-test this one. If it's already been fixed, we'll attach it to the errata for documentation reasons.
I'm not seeing this issue any more. Marking verified in the latest rpms. 2.6.32-422.el6.x86_64 lvm2-2.02.100-6.el6 BUILT: Wed Oct 16 14:26:00 CEST 2013 lvm2-libs-2.02.100-6.el6 BUILT: Wed Oct 16 14:26:00 CEST 2013 lvm2-cluster-2.02.100-6.el6 BUILT: Wed Oct 16 14:26:00 CEST 2013 udev-147-2.50.el6 BUILT: Fri Oct 11 12:58:10 CEST 2013 device-mapper-1.02.79-6.el6 BUILT: Wed Oct 16 14:26:00 CEST 2013 device-mapper-libs-1.02.79-6.el6 BUILT: Wed Oct 16 14:26:00 CEST 2013 device-mapper-event-1.02.79-6.el6 BUILT: Wed Oct 16 14:26:00 CEST 2013 device-mapper-event-libs-1.02.79-6.el6 BUILT: Wed Oct 16 14:26:00 CEST 2013 cmirror-2.02.100-6.el6 BUILT: Wed Oct 16 14:26:00 CEST 2013 ============================================================ Iteration 1412 of 10000 started at Fri Oct 18 20:42:30 CEST 2013 ============================================================ SCENARIO - [raid1_to_pool_conversion] Create raid1 volumes and convert them to pool and pool meta volumes lvcreate --type raid1 -m 1 -L 100M -n to_pool_convert snapper_thinp lvcreate --type raid1 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert device-mapper: remove ioctl on failed: Device or resource busy device-mapper: remove ioctl on failed: Device or resource busy lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n origin lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other1 lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other2 lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other3 lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other4 lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other5 Making snapshot of origin volume lvcreate -K -s /dev/snapper_thinp/origin -n snap_of_pool_convert Removing volume snapper_thinp/snap_of_pool_convert Removing thin origin and other virtual thin volumes Removing thinpool snapper_thinp/to_pool_convert SCENARIO - [raid10_to_pool_conversion] Create raid10 volumes and convert them to pool and pool meta volumes lvcreate --type raid10 -m 1 -L 100M -n to_pool_convert snapper_thinp lvcreate --type raid10 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert device-mapper: remove ioctl on failed: Device or resource busy device-mapper: remove ioctl on failed: Device or resource busy lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n origin lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other1 lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other2 lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other3 lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other4 lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other5 Making snapshot of origin volume lvcreate -K -s /dev/snapper_thinp/origin -n snap_of_pool_convert Removing volume snapper_thinp/snap_of_pool_convert Removing thin origin and other virtual thin volumes Removing thinpool snapper_thinp/to_pool_convert
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1704.html