RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 910104 - OOM issues when running many raid -> thin pool conversions w/ lvmetad running
Summary: OOM issues when running many raid -> thin pool conversions w/ lvmetad running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Petr Rockai
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-11 20:18 UTC by Corey Marthaler
Modified: 2013-11-21 23:20 UTC (History)
11 users (show)

Fixed In Version: lvm2-2.02.100-1.el6
Doc Type: Bug Fix
Doc Text:
Cause: Cached metadata in lvmetad could be leaked under some circumstances during metadata updates. Consequence: Memory use of lvmetad could continually grow during long periods of time, possibly resulting in out of memory conditions. Fix: The leak has been fixed. Result: The memory used by lvmetad is proportional to the amount of metadata it holds at any given time, and can no longer grow without bound over time.
Clone Of:
Environment:
Last Closed: 2013-11-21 23:20:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1704 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2013-11-20 21:52:01 UTC

Description Corey Marthaler 2013-02-11 20:18:42 UTC
Description of problem:
./snapper_thinp -e raid_to_pool_conversion

============================================================
Iteration 2549 of 10000 started at Sat Feb  9 23:14:51 CST 2013
============================================================
SCENARIO - [raid_to_pool_conversion]
Create raid volumes and convert them to pool and pool meta volumes
lvcreate --type raid1 -m 1 -L 100M -n to_pool_convert snapper_thinp
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  No input from event server.
  snapper_thinp-to_pool_convert: event registration failed: No such file or directory
  snapper_thinp/to_pool_convert: mirror segment monitoring function failed.
lvcreate --type raid1 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  No input from event server.
  snapper_thinp-to_pool_meta_convert: event registration failed: Input/output error
  snapper_thinp/to_pool_meta_convert: mirror segment monitoring function failed.
lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  No input from event server.
  snapper_thinp-to_pool_convert_tdata: event registration failed: Input/output error
  snapper_thinp/to_pool_convert_tdata: mirror segment monitoring function failed.
  Failed to monitor to_pool_convert_tdata
  No input from event server.
  No input from event server.
  snapper_thinp-to_pool_convert_tmeta: event registration failed: Input/output error
  snapper_thinp/to_pool_convert_tmeta: mirror segment monitoring function failed.
  No input from event server.
  No input from event server.
  snapper_thinp-to_pool_convert-tpool: event registration failed: Input/output error
  snapper_thinp/to_pool_convert: thin-pool segment monitoring function failed.
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n origin
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  No input from event server.
  No input from event server.
  snapper_thinp-to_pool_convert_tdata: event registration failed: Input/output error
  snapper_thinp/to_pool_convert_tdata: mirror segment monitoring function failed.
  Failed to monitor to_pool_convert_tdata
  No input from event server.
  No input from event server.
  snapper_thinp-to_pool_convert_tmeta: event registration failed: Input/output error
  snapper_thinp/to_pool_convert_tmeta: mirror segment monitoring function failed.
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 Success
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
  Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1
Making snapshot of origin volume
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
lvcreate -s /dev/snapper_thinp/origin -n snap_of_pool_convert
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
Removing volume snapper_thinp/snap_of_pool_convert
Removing thin origin and other virtual thin volumes
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
Removing thinpool snapper_thinp/to_pool_convert
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.



qarshd[18260]: Running cmdline: lvcreate -s /dev/snapper_thinp/origin -n snap_of_pool_convert
kernel: lvcreate invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
kernel: lvcreate cpuset=/ mems_allowed=0
kernel: Pid: 18261, comm: lvcreate Not tainted 2.6.32-354.el6.x86_64 #1
kernel: Call Trace:
kernel: [<ffffffff810cb5d1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
kernel: [<ffffffff8111cd10>] ? dump_header+0x90/0x1b0
kernel: [<ffffffff810cc671>] ? cpuset_mems_allowed_intersects+0x21/0x30
kernel: [<ffffffff8111d192>] ? oom_kill_process+0x82/0x2a0
kernel: [<ffffffff8111d0d1>] ? select_bad_process+0xe1/0x120
kernel: [<ffffffff8111d5d0>] ? out_of_memory+0x220/0x3c0
kernel: [<ffffffff8112c27c>] ? __alloc_pages_nodemask+0x8ac/0x8d0
kernel: [<ffffffff8116088a>] ? alloc_pages_vma+0x9a/0x150
kernel: [<ffffffff81143cfb>] ? handle_pte_fault+0x76b/0xb50
kernel: [<ffffffff8104baa7>] ? pte_alloc_one+0x37/0x50
kernel: [<ffffffff8117b379>] ? do_huge_pmd_anonymous_page+0xb9/0x380
kernel: [<ffffffff8114431a>] ? handle_mm_fault+0x23a/0x310
kernel: [<ffffffff810474c9>] ? __do_page_fault+0x139/0x480
kernel: [<ffffffff811488ea>] ? vma_merge+0x29a/0x3e0
kernel: [<ffffffff81149cac>] ? do_brk+0x26c/0x350
kernel: [<ffffffff81512c6e>] ? do_page_fault+0x3e/0xa0
kernel: [<ffffffff81510025>] ? page_fault+0x25/0x30
kernel: Mem-Info:
kernel: Node 0 DMA per-cpu:
kernel: CPU    0: hi:    0, btch:   1 usd:   0
kernel: CPU    1: hi:    0, btch:   1 usd:   0
kernel: Node 0 DMA32 per-cpu:
kernel: CPU    0: hi:  186, btch:  31 usd:  37
kernel: CPU    1: hi:  186, btch:  31 usd:   0
kernel: active_anon:22125 inactive_anon:22556 isolated_anon:0
kernel: active_file:14 inactive_file:14 isolated_file:0
kernel: unevictable:209319 dirty:0 writeback:4 unstable:0
kernel: free:13246 slab_reclaimable:2524 slab_unreclaimable:99611
kernel: mapped:901 shmem:57 pagetables:3627 bounce:0
kernel: Node 0 DMA free:8340kB min:332kB low:412kB high:496kB active_anon:2296kB inactive_anon:3828kB active_file:20kB inactive_file:0kB unevictable:580kB isolated(anon):0kB isolated(file):0kB present:15252kB mlocked:580kB dirty:0kB writeback:0kB mapped:36kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:140kB kernel_stack:8kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 2004 2004 2004
kernel: Node 0 DMA32 free:44644kB min:44720kB low:55900kB high:67080kB active_anon:86204kB inactive_anon:86396kB active_file:36kB inactive_file:56kB unevictable:836696kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:765156kB dirty:0kB writeback:16kB mapped:3568kB shmem:228kB slab_reclaimable:10096kB slab_unreclaimable:398304kB kernel_stack:1792kB pagetables:14484kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:179 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 0 0 0
kernel: Node 0 DMA: 7*4kB 34*8kB 4*16kB 4*32kB 5*64kB 5*128kB 1*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 8364kB
kernel: Node 0 DMA32: 289*4kB 122*8kB 841*16kB 404*32kB 76*64kB 20*128kB 10*256kB 4*512kB 2*1024kB 1*2048kB 0*4096kB = 44644kB
kernel: 3651 total pagecache pages
kernel: 2691 pages in swap cache
kernel: Swap cache stats: add 5761846, delete 5759155, find 15220058/15731145
kernel: Free swap  = 0kB
kernel: Total swap = 4128760kB
kernel: 524284 pages RAM
kernel: 43654 pages reserved
kernel: 3800 pages shared
kernel: 462483 pages non-shared
kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
kernel: [  610]     0   610     2765      150   0     -17         -1000 udevd
kernel: [ 6719]     0  6719   454812   209418   0     -17         -1000 dmeventd
kernel: [ 7181]     0  7181     2279        1   0       0             0 dhclient
kernel: [ 7227]     0  7227    23293       32   1     -17         -1000 auditd
kernel: [ 7243]     0  7243    64047      828   1       0             0 rsyslogd
kernel: [ 7270]     0  7270     2704       73   1       0             0 irqbalance
kernel: [ 7284]    32  7284     4743       17   0       0             0 rpcbind
kernel: [ 7302]    29  7302     5836        1   0       0             0 rpc.statd
kernel: [ 7335]     0  7335     6290        1   0       0             0 rpc.idmapd
kernel: [ 7428]    81  7428     7942        1   1       0             0 dbus-daemon
kernel: [ 7458]     0  7458     1019        0   1       0             0 acpid
kernel: [ 7467]    68  7467     6513      227   0       0             0 hald
kernel: [ 7468]     0  7468     4526        1   1       0             0 hald-runner
kernel: [ 7496]     0  7496     5055        1   1       0             0 hald-addon-inpu
kernel: [ 7511]    68  7511     4451        1   1       0             0 hald-addon-acpi
kernel: [ 7528]     0  7528    16029        0   1     -17         -1000 sshd
kernel: [ 7536]     0  7536     5523       86   0       0             0 xinetd
kernel: [ 7612]     0  7612    19677       41   1       0             0 master
kernel: [ 7621]    89  7621    19740       35   0       0             0 qmgr
kernel: [ 7636]     0  7636    27545        1   1       0             0 abrtd
kernel: [ 7644]     0  7644    29302       70   1       0             0 crond
kernel: [ 7655]     0  7655     5363        5   0       0             0 atd
kernel: [ 7668]     0  7668    25972        1   0       0             0 rhsmcertd
kernel: [ 7690]     0  7690     1015        1   0       0             0 mingetty
kernel: [ 7692]     0  7692     1015        1   0       0             0 mingetty
kernel: [ 7694]     0  7694     1015        1   0       0             0 mingetty
kernel: [ 7696]     0  7696     1015        1   0       0             0 mingetty
kernel: [ 7698]     0  7698    19275        1   1       0             0 login
kernel: [ 7699]     0  7699     1015        1   0       0             0 mingetty
kernel: [ 7701]     0  7701     1015        1   0       0             0 mingetty
kernel: [ 7710]     0  7710    24466        1   0       0             0 sshd
kernel: [ 7720]     0  7720   258417        1   1       0             0 console-kit-dae
kernel: [ 7786]     0  7786    27083        1   1       0             0 bash
kernel: [ 7800]     0  7800    25234        1   0       0             0 tail
kernel: [ 7803]     0  7803    27084        1   0       0             0 bash
kernel: [ 9054]     0  9054  1309944    32103   1       0             0 lvmetad
kernel: [ 9057]     0  9057     1012        9   1       0             0 btimed
kernel: [14565]    89 14565    19697      161   1       0             0 pickup
kernel: [18212]     0 18212     2832      362   0     -17         -1000 udevd
kernel: [18225]     0 18225     2832      328   1     -17         -1000 udevd
kernel: [18260]     0 18260     4091      204   1       0             0 qarshd
kernel: [18261]     0 18261    37967     8040   1       0             0 lvcreate
kernel: [18263]     0 18263     2766      257   0     -17         -1000 udevd
kernel: [18264]     0 18264     2766      257   1     -17         -1000 udevd
kernel: [18265]     0 18265     2766      257   1     -17         -1000 udevd
kernel: [18266]     0 18266     2766      257   1     -17         -1000 udevd
kernel: [18267]     0 18267     2766      257   0     -17         -1000 udevd
kernel: [18268]     0 18268     2766      257   0     -17         -1000 udevd
kernel: [18277]     0 18277     2764      251   0     -17         -1000 udevd
kernel: [18278]     0 18278     2766      257   0     -17         -1000 udevd
kernel: Out of memory: Kill process 9054 (lvmetad) score 672 or sacrifice child
kernel: Killed process 9054, UID 0, (lvmetad) total-vm:5239776kB, anon-rss:128120kB, file-rss:292kB



Version-Release number of selected component (if applicable):
2.6.32-354.el6.x86_64
lvm2-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
lvm2-libs-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
lvm2-cluster-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-libs-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-event-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-event-libs-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
cmirror-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013

Comment 1 Zdenek Kabelac 2013-03-05 09:53:17 UTC
dmeventd memory size looks like its leaking memory.

Comment 3 Jonathan Earl Brassow 2013-05-09 18:50:30 UTC
Also be aware:
 Bug 956769 - MD RAID1/10 are leaking memory when they are stopped

Although, it would take quite a few iterations for this to become a problem...

Comment 4 Alasdair Kergon 2013-10-07 21:55:49 UTC
 Ignoring out-of-sequence reply from dmeventd. Expected 5312:0 but received 5241:11 HELLO HELLO 1

Comment 5 Petr Rockai 2013-10-11 09:41:40 UTC
This report seems to fall inbetween:

commit 15fdd5c90dda7f00f691668c13d5401206d22021
Date:   Wed Jan 16 11:09:37 2013 +0100

and

commit 95372a852bbcacc9f194324832b94fcf1493f7c5
Date:   Wed Apr 3 13:46:12 2013 +0200

    lvmetad: Fix a memory leak introduced in 15fdd5c90dd.

so I believe this was fixed in 95372a852bbcacc9f194324832b94fcf1493f7c5. Either way, I can't reproduce the problem, and if there was a leak in lvmetad, we'd be still seeing many failures like this one.

Comment 6 Alasdair Kergon 2013-10-14 18:00:56 UTC
So please re-test this one.

If it's already been fixed, we'll attach it to the errata for documentation reasons.

Comment 9 Corey Marthaler 2013-10-18 18:49:18 UTC
I'm not seeing this issue any more. Marking verified in the latest rpms.

2.6.32-422.el6.x86_64
lvm2-2.02.100-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
lvm2-libs-2.02.100-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
lvm2-cluster-2.02.100-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
udev-147-2.50.el6    BUILT: Fri Oct 11 12:58:10 CEST 2013
device-mapper-1.02.79-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
device-mapper-libs-1.02.79-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
device-mapper-event-1.02.79-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
device-mapper-event-libs-1.02.79-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013
cmirror-2.02.100-6.el6    BUILT: Wed Oct 16 14:26:00 CEST 2013



============================================================
Iteration 1412 of 10000 started at Fri Oct 18 20:42:30 CEST 2013
============================================================
SCENARIO - [raid1_to_pool_conversion]
Create raid1 volumes and convert them to pool and pool meta volumes
lvcreate --type raid1 -m 1 -L 100M -n to_pool_convert snapper_thinp
lvcreate --type raid1 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp
lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert
  device-mapper: remove ioctl on  failed: Device or resource busy
  device-mapper: remove ioctl on  failed: Device or resource busy
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n origin
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other1
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other2
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other3
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other4
lvcreate --virtualsize 500M --thinpool snapper_thinp/to_pool_convert -n other5
Making snapshot of origin volume
lvcreate -K -s /dev/snapper_thinp/origin -n snap_of_pool_convert
Removing volume snapper_thinp/snap_of_pool_convert
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/to_pool_convert


SCENARIO - [raid10_to_pool_conversion]
Create raid10 volumes and convert them to pool and pool meta volumes
lvcreate --type raid10 -m 1 -L 100M -n to_pool_convert snapper_thinp
lvcreate --type raid10 -m 1 -L 100M -n to_pool_meta_convert snapper_thinp
lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert
  device-mapper: remove ioctl on  failed: Device or resource busy
  device-mapper: remove ioctl on  failed: Device or resource busy
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n origin
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other1


lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other2
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other3
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other4
lvcreate --virtualsize 500M -T snapper_thinp/to_pool_convert -n other5
Making snapshot of origin volume
lvcreate -K -s /dev/snapper_thinp/origin -n snap_of_pool_convert
Removing volume snapper_thinp/snap_of_pool_convert
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/to_pool_convert

Comment 10 errata-xmlrpc 2013-11-21 23:20:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1704.html


Note You need to log in before you can comment on or make changes to this bug.