Bug 910104 - OOM issues when running many raid -> thin pool conversions w/ lvmetad running
OOM issues when running many raid -> thin pool conversions w/ lvmetad running
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.4
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Petr Rockai
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-11 15:18 EST by Corey Marthaler
Modified: 2013-11-21 18:20 EST (History)
11 users (show)

See Also:
Fixed In Version: lvm2-2.02.100-1.el6
Doc Type: Bug Fix
Doc Text:
Cause: Cached metadata in lvmetad could be leaked under some circumstances during metadata updates. Consequence: Memory use of lvmetad could continually grow during long periods of time, possibly resulting in out of memory conditions. Fix: The leak has been fixed. Result: The memory used by lvmetad is proportional to the amount of metadata it holds at any given time, and can no longer grow without bound over time.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 18:20:31 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2013-02-11 15:18:42 EST
Description of problem:
./snapper_thinp -e raid_to_pool_conversion

============================================================
Iteration 2549 of 10000 started at Sat Feb  9 23:14:51 CST 2013
============================================================
SCENARIO - [raid_to_pool_conversion]
Create raid volumes and convert them to pool and pool meta volumes
lvcreate --type raid1 -m 1 -L 100M -n to_pool_convert snapper_thinp
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  No input from event server.
  snapper_thinp-to_pool_convert: event registration failed: No such file or directory
  snapper_thinp/to_pool_convert: mirror segment monitoring function failed.
lvcreate --type raid1 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  No input from event server.
  snapper_thinp-to_pool_meta_convert: event registration failed: Input/output error
  snapper_thinp/to_pool_meta_convert: mirror segment monitoring function failed.
lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  No input from event server.
  snapper_thinp-to_pool_convert_tdata: event registration failed: Input/output error
  snapper_thinp/to_pool_convert_tdata: mirror segment monitoring function failed.
  Failed to monitor to_pool_convert_tdata
  No input from event server.
  No input from event server.
  snapper_thinp-to_pool_convert_tmeta: event registration failed: Input/output error
  snapper_thinp/to_pool_convert_tmeta: mirror segment monitoring function failed.
  No input from event server.
  No input from event server.
  snapper_thinp-to_pool_convert-tpool: event registration failed: Input/output error
  snapper_thinp/to_pool_convert: thin-pool segment monitoring function failed.
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n origin
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  No input from event server.
  No input from event server.
  snapper_thinp-to_pool_convert_tdata: event registration failed: Input/output error
  snapper_thinp/to_pool_convert_tdata: mirror segment monitoring function failed.
  Failed to monitor to_pool_convert_tdata
  No input from event server.
  No input from event server.
  snapper_thinp-to_pool_convert_tmeta: event registration failed: Input/output error
  snapper_thinp/to_pool_convert_tmeta: mirror segment monitoring function failed.
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 Success
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
Making snapshot of origin volume
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
lvcreate -s /dev/snapper_thinp/origin -n snap_of_pool_convert
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
Removing volume snapper_thinp/snap_of_pool_convert
Removing thin origin and other virtual thin volumes
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
Removing thinpool snapper_thinp/to_pool_convert
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.



qarshd[18260]: Running cmdline: lvcreate -s /dev/snapper_thinp/origin -n snap_of_pool_convert
kernel: lvcreate invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
kernel: lvcreate cpuset=/ mems_allowed=0
kernel: Pid: 18261, comm: lvcreate Not tainted 2.6.32-354.el6.x86_64 #1
kernel: Call Trace:
kernel: [<ffffffff810cb5d1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
kernel: [<ffffffff8111cd10>] ? dump_header+0x90/0x1b0
kernel: [<ffffffff810cc671>] ? cpuset_mems_allowed_intersects+0x21/0x30
kernel: [<ffffffff8111d192>] ? oom_kill_process+0x82/0x2a0
kernel: [<ffffffff8111d0d1>] ? select_bad_process+0xe1/0x120
kernel: [<ffffffff8111d5d0>] ? out_of_memory+0x220/0x3c0
kernel: [<ffffffff8112c27c>] ? __alloc_pages_nodemask+0x8ac/0x8d0
kernel: [<ffffffff8116088a>] ? alloc_pages_vma+0x9a/0x150
kernel: [<ffffffff81143cfb>] ? handle_pte_fault+0x76b/0xb50
kernel: [<ffffffff8104baa7>] ? pte_alloc_one+0x37/0x50
kernel: [<ffffffff8117b379>] ? do_huge_pmd_anonymous_page+0xb9/0x380
kernel: [<ffffffff8114431a>] ? handle_mm_fault+0x23a/0x310
kernel: [<ffffffff810474c9>] ? __do_page_fault+0x139/0x480
kernel: [<ffffffff811488ea>] ? vma_merge+0x29a/0x3e0
kernel: [<ffffffff81149cac>] ? do_brk+0x26c/0x350
kernel: [<ffffffff81512c6e>] ? do_page_fault+0x3e/0xa0
kernel: [<ffffffff81510025>] ? page_fault+0x25/0x30
kernel: Mem-Info:
kernel: Node 0 DMA per-cpu:
kernel: CPU    0: hi:    0, btch:   1 usd:   0
kernel: CPU    1: hi:    0, btch:   1 usd:   0
kernel: Node 0 DMA32 per-cpu:
kernel: CPU    0: hi:  186, btch:  31 usd:  37
kernel: CPU    1: hi:  186, btch:  31 usd:   0
kernel: active_anon:22125 inactive_anon:22556 isolated_anon:0
kernel: active_file:14 inactive_file:14 isolated_file:0
kernel: unevictable:209319 dirty:0 writeback:4 unstable:0
kernel: free:13246 slab_reclaimable:2524 slab_unreclaimable:99611
kernel: mapped:901 shmem:57 pagetables:3627 bounce:0
kernel: Node 0 DMA free:8340kB min:332kB low:412kB high:496kB active_anon:2296kB inactive_anon:3828kB active_file:20kB inactive_file:0kB unevictable:580kB isolated(anon):0kB isolated(file):0kB present:15252kB mlocked:580kB dirty:0kB writeback:0kB mapped:36kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:140kB kernel_stack:8kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 2004 2004 2004
kernel: Node 0 DMA32 free:44644kB min:44720kB low:55900kB high:67080kB active_anon:86204kB inactive_anon:86396kB active_file:36kB inactive_file:56kB unevictable:836696kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:765156kB dirty:0kB writeback:16kB mapped:3568kB shmem:228kB slab_reclaimable:10096kB slab_unreclaimable:398304kB kernel_stack:1792kB pagetables:14484kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:179 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 0 0 0
kernel: Node 0 DMA: 7*4kB 34*8kB 4*16kB 4*32kB 5*64kB 5*128kB 1*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 8364kB
kernel: Node 0 DMA32: 289*4kB 122*8kB 841*16kB 404*32kB 76*64kB 20*128kB 10*256kB 4*512kB 2*1024kB 1*2048kB 0*4096kB = 44644kB
kernel: 3651 total pagecache pages
kernel: 2691 pages in swap cache
kernel: Swap cache stats: add 5761846, delete 5759155, find 15220058/15731145
kernel: Free swap  = 0kB
kernel: Total swap = 4128760kB
kernel: 524284 pages RAM
kernel: 43654 pages reserved
kernel: 3800 pages shared
kernel: 462483 pages non-shared
kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
kernel: [  610]     0   610     2765      150   0     -17         -1000 udevd
kernel: [ 6719]     0  6719   454812   209418   0     -17         -1000 dmeventd
kernel: [ 7181]     0  7181     2279        1   0       0             0 dhclient
kernel: [ 7227]     0  7227    23293       32   1     -17         -1000 auditd
kernel: [ 7243]     0  7243    64047      828   1       0             0 rsyslogd
kernel: [ 7270]     0  7270     2704       73   1       0             0 irqbalance
kernel: [ 7284]    32  7284     4743       17   0       0             0 rpcbind
kernel: [ 7302]    29  7302     5836        1   0       0             0 rpc.statd
kernel: [ 7335]     0  7335     6290        1   0       0             0 rpc.idmapd
kernel: [ 7428]    81  7428     7942        1   1       0             0 dbus-daemon
kernel: [ 7458]     0  7458     1019        0   1       0             0 acpid
kernel: [ 7467]    68  7467     6513      227   0       0             0 hald
kernel: [ 7468]     0  7468     4526        1   1       0             0 hald-runner
kernel: [ 7496]     0  7496     5055        1   1       0             0 hald-addon-inpu
kernel: [ 7511]    68  7511     4451        1   1       0             0 hald-addon-acpi
kernel: [ 7528]     0  7528    16029        0   1     -17         -1000 sshd
kernel: [ 7536]     0  7536     5523       86   0       0             0 xinetd
kernel: [ 7612]     0  7612    19677       41   1       0             0 master
kernel: [ 7621]    89  7621    19740       35   0       0             0 qmgr
kernel: [ 7636]     0  7636    27545        1   1       0             0 abrtd
kernel: [ 7644]     0  7644    29302       70   1       0             0 crond
kernel: [ 7655]     0  7655     5363        5   0       0             0 atd
kernel: [ 7668]     0  7668    25972        1   0       0             0 rhsmcertd
kernel: [ 7690]     0  7690     1015        1   0       0             0 mingetty
kernel: [ 7692]     0  7692     1015        1   0       0             0 mingetty
kernel: [ 7694]     0  7694     1015        1   0       0             0 mingetty
kernel: [ 7696]     0  7696     1015        1   0       0             0 mingetty
kernel: [ 7698]     0  7698    19275        1   1       0             0 login
kernel: [ 7699]     0  7699     1015        1   0       0             0 mingetty
kernel: [ 7701]     0  7701     1015        1   0       0             0 mingetty
kernel: [ 7710]     0  7710    24466        1   0       0             0 sshd
kernel: [ 7720]     0  7720   258417        1   1       0             0 console-kit-dae
kernel: [ 7786]     0  7786    27083        1   1       0             0 bash
kernel: [ 7800]     0  7800    25234        1   0       0             0 tail
kernel: [ 7803]     0  7803    27084        1   0       0             0 bash
kernel: [ 9054]     0  9054  1309944    32103   1       0             0 lvmetad
kernel: [ 9057]     0  9057     1012        9   1       0             0 btimed
kernel: [14565]    89 14565    19697      161   1       0             0 pickup
kernel: [18212]     0 18212     2832      362   0     -17         -1000 udevd
kernel: [18225]     0 18225     2832      328   1     -17         -1000 udevd
kernel: [18260]     0 18260     4091      204   1       0             0 qarshd
kernel: [18261]     0 18261    37967     8040   1       0             0 lvcreate
kernel: [18263]     0 18263     2766      257   0     -17         -1000 udevd
kernel: [18264]     0 18264     2766      257   1     -17         -1000 udevd
kernel: [18265]     0 18265     2766      257   1     -17         -1000 udevd
kernel: [18266]     0 18266     2766      257   1     -17         -1000 udevd
kernel: [18267]     0 18267     2766      257   0     -17         -1000 udevd
kernel: [18268]     0 18268     2766      257   0     -17         -1000 udevd
kernel: [18277]     0 18277     2764      251   0     -17         -1000 udevd
kernel: [18278]     0 18278     2766      257   0     -17         -1000 udevd
kernel: Out of memory: Kill process 9054 (lvmetad) score 672 or sacrifice child
kernel: Killed process 9054, UID 0, (lvmetad) total-vm:5239776kB, anon-rss:128120kB, file-rss:292kB



Version-Release number of selected component (if applicable):
2.6.32-354.el6.x86_64
lvm2-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
lvm2-libs-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
lvm2-cluster-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-libs-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-event-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-event-libs-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
cmirror-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
Comment 1 Zdenek Kabelac 2013-03-05 04:53:17 EST
dmeventd memory size looks like its leaking memory.
Comment 3 Jonathan Earl Brassow 2013-05-09 14:50:30 EDT
Also be aware:
 Bug 956769 - MD RAID1/10 are leaking memory when they are stopped

Although, it would take quite a few iterations for this to become a problem...
Comment 4 Alasdair Kergon 2013-10-07 17:55:49 EDT
 Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
Comment 5 Petr Rockai 2013-10-11 05:41:40 EDT
This report seems to fall inbetween:

commit 15fdd5c90dda7f00f691668c13d5401206d22021
Date:   Wed Jan 16 11:09:37 2013 +0100

and

commit 95372a852bbcacc9f194324832b94fcf1493f7c5
Date:   Wed Apr 3 13:46:12 2013 +0200

    lvmetad: Fix a memory leak introduced in 15fdd5c90dd.

so I believe this was fixed in 95372a852bbcacc9f194324832b94fcf1493f7c5. Either way, I can't reproduce the problem, and if there was a leak in lvmetad, we'd be still seeing many failures like this one.
Comment 6 Alasdair Kergon 2013-10-14 14:00:56 EDT
So please re-test this one.

If it's already been fixed, we'll attach it to the errata for documentation reasons.
Comment 9 Corey Marthaler 2013-10-18 14:49:18 EDT
I'm not seeing this issue any more. Marking verified in the latest rpms.

2.6.32-422.el6.x86_64
lvm2-2.02.100-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
lvm2-libs-2.02.100-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
lvm2-cluster-2.02.100-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
udev-147-2.50.el6    BUILT: Fri Oct 11 12:58:10 CEST 2013
device-mapper-1.02.79-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
device-mapper-libs-1.02.79-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
device-mapper-event-1.02.79-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
device-mapper-event-libs-1.02.79-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
cmirror-2.02.100-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013



============================================================
Iteration 1412 of 10000 started at Fri Oct 18 20:42:30 CEST 2013
============================================================
SCENARIO - [raid1_to_pool_conversion]
Create raid1 volumes and convert them to pool and pool meta volumes
lvcreate --type raid1 -m 1 -L 100M -n to_pool_convert snapper_thinp
lvcreate --type raid1 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp
lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert
  device-mapper: remove ioctl on  failed: Device or resource busy
  device-mapper: remove ioctl on  failed: Device or resource busy
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n origin
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other1
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other2
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other3
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other4
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other5
Making snapshot of origin volume
lvcreate -K -s /dev/snapper_thinp/origin -n snap_of_pool_convert
Removing volume snapper_thinp/snap_of_pool_convert
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/to_pool_convert


SCENARIO - [raid10_to_pool_conversion]
Create raid10 volumes and convert them to pool and pool meta volumes
lvcreate --type raid10 -m 1 -L 100M -n to_pool_convert snapper_thinp
lvcreate --type raid10 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp
lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert
  device-mapper: remove ioctl on  failed: Device or resource busy
  device-mapper: remove ioctl on  failed: Device or resource busy
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n origin
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other1


lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other2
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other3
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other4
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other5
Making snapshot of origin volume
lvcreate -K -s /dev/snapper_thinp/origin -n snap_of_pool_convert
Removing volume snapper_thinp/snap_of_pool_convert
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/to_pool_convert
Comment 10 errata-xmlrpc 2013-11-21 18:20:31 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1704.html

Note You need to log in before you can comment on or make changes to this bug.