889651 – segfault in dmeventd in low memory situations

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 889651 - segfault in dmeventd in low memory situations

Summary: segfault in dmeventd in low memory situations

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	902806
TreeView+	depends on / blocked

Reported:	2012-12-22 07:55 UTC by Marian Csontos
Modified:	2023-09-14 01:39 UTC (History)
CC List:	11 users (show)
Fixed In Version:	lvm2-2.02.140-1.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	902806 (view as bug list)
Environment:
Last Closed:	2016-05-11 01:20:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Proposal patch for locking changes (6.54 KB, patch) 2013-01-22 14:21 UTC, Zdenek Kabelac	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0964	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2016-05-10 22:57:40 UTC

Description Marian Csontos 2012-12-22 07:55:15 UTC

Description of problem:
dmeventd monitoring thinpool segfaulted (w/ lvmetad on and running.)

Version-Release number of selected component (if applicable):
lvm2-2.02.98-6.16.el6

How reproducible:
???

Steps to Reproduce:
1. ???
  
Actual results:
segfault

Expected results:
pass

Additional info:

gdb: 
    Core was generated by `/sbin/dmeventd'.
    Program terminated with signal 11, Segmentation fault.
    #0  __pthread_kill (threadid=0, signo=14) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:43
    43        pid_t tid = atomic_forced_read (pd->tid);
    Missing separate debuginfos, use: debuginfo-install libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libsepol-2.0.41-4.el6.x86_64
    (gdb) info threads
      2 Thread 0x7f4784ca57a0 (LWP 5577)  0x00007f4783d64637 in mlockall () at ../sysdeps/unix/syscall-template.S:82
    * 1 Thread 0x7f4784ca4700 (LWP 5578)  __pthread_kill (threadid=0, signo=14) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:43
    (gdb) bt full
    #0  __pthread_kill (threadid=0, signo=14) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:43
            __x = <value optimized out>
            pd = 0x0
            tid = <value optimized out>
            val = <value optimized out>
    #1  0x0000000000403529 in _timeout_thread (unused=0x0) at dmeventd.c:533
            thread = 0xa535e0
            __clframe = {__cancel_routine = 0x403451 <_exit_timeout>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = 0}
            timeout = {tv_sec = 0, tv_nsec = 0}
            curr_time = 1356136940
    #2  0x00007f4784475851 in start_thread (arg=0x7f4784ca4700) at pthread_create.c:301
            __res = <value optimized out>
            pd = 0x7f4784ca4700
            now = <value optimized out>
            unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139945147254528, 3440892323784634456, 140736754832112, 139945147255232, 0, 3, 
                    -3364320029787213736, -3364318832364801960}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, 
                  cleanup = 0x0, canceltype = 0}}}
            not_first_call = <value optimized out>
            pagesize_m1 = <value optimized out>
            sp = <value optimized out>
            freesize = <value optimized out>
    #3  0x00007f4783d6790d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    No locals.
    (gdb) thread 2
    [Switching to thread 2 (Thread 0x7f4784ca57a0 (LWP 5577))]#0  0x00007f4783d64637 in mlockall () at ../sysdeps/unix/syscall-template.S:82
    82      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
    (gdb) bt full
    #0  0x00007f4783d64637 in mlockall () at ../sysdeps/unix/syscall-template.S:82
    No locals.
    #1  0x00007f4782f5071e in _memlock_maps (cmd=0xa56d60, lock=LVM_MLOCK, mstats=0x7f47832069b8) at mm/memlock.c:244
            cn = 0x2
            line = 0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>
            line_end = 0xa00000 <Address 0xa00000 out of bounds>
            len = 139945116500252
            n = 139945119277151
            ret = 1
    #2  0x00007f4782f50bf5 in _lock_mem (cmd=0xa56d60) at mm/memlock.c:336
    No locals.
    #3  0x00007f4782f50f8f in _lock_mem_if_needed (cmd=0xa56d60) at mm/memlock.c:387
    No locals.
    #4  0x00007f4782f51186 in memlock_inc_daemon (cmd=0xa56d60) at mm/memlock.c:439
    No locals.
    #5  0x00007f4782fa6ebd in lvm2_run (handle=0xa56d60, cmdline=0x7f4783210feb "_memlock_inc") at lvmcmdlib.c:82
            argc = 1
            ret = 1
            oneoff = 0
            args = {0xaa2370 "_memlock_inc", 0xa60368 "\230\003\246", 0x0, 0x7f4782fab6e6 "", 0x0, 0xa604a0 "/etc/lvm/archive", 
              0x7fff00000002 <Address 0x7fff00000002 out of bounds>, 0x0, 0x0, 0x40 <Address 0x40 out of bounds>, 0x0, 0x0, 0x0, 
              0x3000000002 <Address 0x3000000002 out of bounds>, 0x7c0000005b <Address 0x7c0000005b out of bounds>, 
              0x6e00000077 <Address 0x6e00000077 out of bounds>, 0x10000000a <Address 0x10000000a out of bounds>, 0xa603c8 "/etc/lvm/backup", 
              0x10 <Address 0x10 out of bounds>, 0x7f478400de80 "", 0x7fffd44755c0 "\001", 0x0, 0x0, 
              0x7f4783cf9911 "H\205\300H\211\303t7\203=P}1", 0xa603c8 "/etc/lvm/backup", 0x10 <Address 0x10 out of bounds>, 
              0x4029b0 "1\355I\211\321^H\211\342H\203\344\360PTI\307\300\360i@", 
              0x7f4783d00042 "H\205\300t!H\211\352H\211\336H\213l$\020H\213\\$\bH\211\307H\203\304\030齇", 
              0x182f55d94 <Address 0x182f55d94 out of bounds>, 0xa56d60 "\300ӥ", 0x7fffd44751c0 "\004", 0x7f4782f06612 "\270\001", 
              0x7fffd44751c0 "\004", 0x7f478447c5b9 "H\211\320H\203\304\bH=\001\360\377\377s\001\303H\213\r\270\231 ", 0x0, 
              0x7f4782f4ff8e "\205\300t'L\215\005>\310\006", 0x7f4782fab6d5 "backup/backup_dir", 0xa604a0 "/etc/lvm/archive", 
              0x4 <Address 0x4 out of bounds>, 0xa56db0 "\316l\364", <incomplete sequence \360>, 0xa7d900 "\200m\245", 
              0x182fafa0d <Address 0x182fafa0d out of bounds>, 0x7fffd4475210 "pRG\324\377\177", 0x82ef3c8f <Address 0x82ef3c8f out of bounds>, 
              0x7fffd4475210 "pRG\324\377\177", 
              0x7f4782ef3cdd "\311\303UH\211\345H\203\354PH\211}\350\211u\344H\211U\330H\211M\320L\211EȋE\344\211\307\350\344\034\377\377\211E\374\203}", <incomplete sequence \374>, 0x0, 0xa56d60 "\300ӥ", 0x7fffd4475270 "\220RG\324\377\177", 0x7f4782ef486c "H\213E\350ǀ\004\001", 
              0x7fff0000006d <Address 0x7fff0000006d out of bounds>, 0xe71777d7f826c739 <Address 0xe71777d7f826c739 out of bounds>, 
              0x7fffffffffff <Address 0x7fffffffffff out of bounds>, 0x100000000 <Address 0x100000000 out of bounds>, 0x0, 
              0x4029b0 "1\355I\211\321^H\211\342H\203\344\360PTI\307\300\360i@", 0xa5d180 "\205&\255", <incomplete sequence \373>, 
              0xa56d60 "\300ӥ", 0x0, 0xa53670 "LVM-M2YDeVDBj9WZxIDhIZotbgvezdGhcRNmgcBJCGwc8GZlXDGtpATgOQJyBnTptkpi-tpool", 
              0x7fffd4475290 "\300RG\324\377\177", 
              0x7f4782f8b3f2 "\205\300t?H\213E\370H\211\307\350\061\232\366\377\350\071I\374\377L\215\005\331\310\003", 
              0x7fffffffffff <Address 0x7fffffffffff out of bounds>, 0xa56d60 "\300ӥ"}
            argv = 0x7fffd4475090
            cmdcopy = 0xaa2370 "_memlock_inc"
            cmd = 0xa56d60
    #6  0x00007f4783210d81 in dmeventd_lvm2_init () at dmeventd_lvm.c:113
            r = 0
    #7  0x00007f4783413f6b in register_device (device=0xa53800 "snapper_thinp-POOL-tpool", 
        uuid=0xa53670 "LVM-M2YDeVDBj9WZxIDhIZotbgvezdGhcRNmgcBJCGwc8GZlXDGtpATgOQJyBnTptkpi-tpool", major=253, minor=4, private=0xa53658)
        at dmeventd_thin.c:405
            statemem = 0x0
            state = 0x0
    #8  0x0000000000403cda in _do_register_device (thread=0xa535e0) at dmeventd.c:688
    No locals.
    #9  0x0000000000404691 in _register_for_event (message_data=0x7fffd4475400) at dmeventd.c:1009
            ret = 0
            thread = 0x0
            thread_new = 0xa535e0
            dso_data = 0xa53560
    #10 0x0000000000405532 in _handle_request (msg=0x7fffd4475470, message_data=0x7fffd4475400) at dmeventd.c:1428
            requests = {{cmd = 2, f = 0x404473 <_register_for_event>}, {cmd = 3, f = 0x404737 <_unregister_for_event>}, {cmd = 4, 
                f = 0x404b27 <_get_registered_device>}, {cmd = 5, f = 0x404b46 <_get_next_registered_device>}, {cmd = 6, 
                f = 0x404b65 <_set_timeout>}, {cmd = 7, f = 0x404bb5 <_get_timeout>}, {cmd = 1, f = 0x404464 <_active>}, {cmd = 10, 
                f = 0x40316e <_get_status>}}
            req = 0x6086c0
    #11 0x00000000004056d3 in _do_process_request (msg=0x7fffd4475470) at dmeventd.c:1457
            ret = -417892393
            answer = 0x7b0000007b <Address 0x7b0000007b out of bounds>
            message_data = {id = 0xa522a0 "5548:3", dso_name = 0xa52180 "libdevmapper-event-lvm2thin.so", 
              device_uuid = 0xa52240 "LVM-M2YDeVDBj9WZxIDhIZotbgvezdGhcRNmgcBJCGwc8GZlXDGtpATgOQJyBnTptkpi-tpool", events = {
                str = 0x2ff00 <Address 0x2ff00 out of bounds>, field = 196352}, timeout = {str = 0xa <Address 0xa out of bounds>, secs = 10}, 
              msg = 0x7fffd4475470}
    #12 0x000000000040579a in _process_request (fifos=0x7fffd44754c0) at dmeventd.c:1486
            die = 0
            msg = {cmd = 2, size = 0, data = 0x0}
    #13 0x000000000040695d in main (argc=1, argv=0x7fffd44755c8) at dmeventd.c:2007
            opt = -1 '\377'
            fifos = {client = 5, server = 4, client_path = 0x406c2b "/var/run/dmeventd-client", 
              server_path = 0x406c44 "/var/run/dmeventd-server"}

messages:

    Dec 21 18:42:08 zaphodc1-node01 xinetd[1871]: START: qarsh pid=5547 from=::ffff:192.168.101.254
    Dec 21 18:42:08 zaphodc1-node01 qarshd[5547]: Talking to peer 192.168.101.254:54730 (IPv4)
    Dec 21 18:42:08 zaphodc1-node01 qarshd[5547]: Running cmdline: lvcreate --thinpool POOL -L 1G snapper_thinp
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Logging initialised at Fri Dec 21 18:42:08 2012
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Set umask from 0002 to 0077
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Executing: /sbin/modprobe dm-thin-pool
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Setting logging type to disk
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Finding volume group "snapper_thinp"
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Setting chunk size to 64.00 KiB.
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Setting pool metadata size to 4.00 MiB.
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Making thin pool POOL in VG snapper_thinp using segtype thin-pool
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Archiving volume group "snapper_thinp" metadata (seqno 1).
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Creating logical volume POOL
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Found volume group "snapper_thinp"
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: activation/volume_list configuration setting not defined: Checking only host tags for snapper_thinp/POOL
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Clearing start of logical volume "POOL"
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Found volume group "snapper_thinp"
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Creating logical volume POOL_tmeta
    Dec 21 18:42:08 zaphodc1-node01 lvm[5548]: Creating logical volume POOL_tdata
    Dec 21 18:42:09 zaphodc1-node01 lvm[5548]: Creating volume group backup "/etc/lvm/backup/snapper_thinp" (seqno 3).
    Dec 21 18:42:09 zaphodc1-node01 lvm[5548]: Found volume group "snapper_thinp"
    Dec 21 18:42:09 zaphodc1-node01 lvm[5548]: activation/volume_list configuration setting not defined: Checking only host tags for snapper_thinp/POOL
    Dec 21 18:42:10 zaphodc1-node01 kernel: device-mapper: thin: Data device (dm-3) discard unsupported: Disabling discard passdown.
    Dec 21 18:42:10 zaphodc1-node01 dmeventd[5577]: dmeventd ready for processing.
    Dec 21 18:42:10 zaphodc1-node01 lvm[5548]: Monitoring snapper_thinp/POOL
    Dec 21 18:42:16 zaphodc1-node01 lvm[5548]: No input from event server.
    Dec 21 18:42:16 zaphodc1-node01 lvm[5548]: snapper_thinp-POOL-tpool: event registration failed: Input/output error
    Dec 21 18:42:16 zaphodc1-node01 lvm[5548]: snapper_thinp/POOL: thin-pool segment monitoring function failed.
    Dec 21 18:42:16 zaphodc1-node01 lvm[5548]: Creating volume group backup "/etc/lvm/backup/snapper_thinp" (seqno 3).
    Dec 21 18:42:16 zaphodc1-node01 lvm[5548]: Logical volume "POOL" created
    Dec 21 18:42:16 zaphodc1-node01 lvm[5548]: Wiping internal VG cache
    Dec 21 18:42:16 zaphodc1-node01 xinetd[1871]: EXIT: qarsh status=0 pid=5547 duration=8(sec)
    Dec 21 18:42:16 zaphodc1-node01 xinetd[1871]: START: qarsh pid=5581 from=::ffff:192.168.101.254
    Dec 21 18:42:16 zaphodc1-node01 qarshd[5581]: Talking to peer 192.168.101.254:54731 (IPv4)
    Dec 21 18:42:16 zaphodc1-node01 qarshd[5581]: Running cmdline: lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n origin
    Dec 21 18:42:16 zaphodc1-node01 lvm[5582]: Logging initialised at Fri Dec 21 18:42:16 2012
    Dec 21 18:42:16 zaphodc1-node01 lvm[5582]: Set umask from 0002 to 0077
    Dec 21 18:42:16 zaphodc1-node01 lvm[5582]: Setting logging type to disk
    Dec 21 18:42:16 zaphodc1-node01 lvm[5582]: Finding volume group "snapper_thinp"
    Dec 21 18:42:16 zaphodc1-node01 lvm[5582]: Making thin LV origin in pool POOL in VG snapper_thinp using segtype thin
    Dec 21 18:42:16 zaphodc1-node01 lvm[5582]: Archiving volume group "snapper_thinp" metadata (seqno 3).
    Dec 21 18:42:16 zaphodc1-node01 lvm[5582]: Creating logical volume origin
    Dec 21 18:42:17 zaphodc1-node01 lvm[5582]: Creating volume group backup "/etc/lvm/backup/snapper_thinp" (seqno 4).
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Found volume group "snapper_thinp"
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Loading snapper_thinp-POOL_tdata table (253:3)
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Suppressed snapper_thinp-POOL_tdata (253:3) identical table reload.
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Loading snapper_thinp-POOL_tmeta table (253:2)
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Suppressed snapper_thinp-POOL_tmeta (253:2) identical table reload.
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Loading snapper_thinp-POOL-tpool table (253:4)
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Suppressed snapper_thinp-POOL-tpool (253:4) identical table reload.
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Loading snapper_thinp-POOL table (253:5)
    Dec 21 18:42:18 zaphodc1-node01 lvm[5582]: Suppressed snapper_thinp-POOL (253:5) identical table reload.
    Dec 21 18:42:23 zaphodc1-node01 kernel: dmeventd[5578]: segfault at 2d0 ip 00007f478447a670 sp 00007f4784ca3da8 error 4 in libpthread-2.12.so[7f478446e000+17000]
    Dec 21 18:42:23 zaphodc1-node01 abrt[5585]: user_pwd:'/'                                                                     
    Dec 21 18:42:23 zaphodc1-node01 abrt[5585]: Locked '/var/spool/abrt/ccpp-2012-12-21-18:42:23-5577.new/.lock'                  
    Dec 21 18:42:24 zaphodc1-node01 lvm[5582]: No input from event server.                                                        
    Dec 21 18:42:24 zaphodc1-node01 lvm[5582]: Monitoring snapper_thinp/POOL
    Dec 21 18:42:24 zaphodc1-node01 abrt[5585]: Unlocked '/var/spool/abrt/ccpp-2012-12-21-18:42:23-5577.new/.lock'
    Dec 21 18:42:24 zaphodc1-node01 abrtd: Directory 'ccpp-2012-12-21-18:42:23-5577' creation detected
    Dec 21 18:42:24 zaphodc1-node01 abrt[5585]: Saved core dump of pid 5577 (/sbin/dmeventd) to /var/spool/abrt/ccpp-2012-12-21-18:42:23-5577 (11661312 bytes)
    Dec 21 18:42:24 zaphodc1-node01 abrt[5585]: cur_size:11669166 cap_size:1341128704, no (more) trimming
    Dec 21 18:42:30 zaphodc1-node01 lvm[5582]: No input from event server.
    Dec 21 18:42:30 zaphodc1-node01 lvm[5582]: snapper_thinp-POOL-tpool: event registration failed: Input/output error
    Dec 21 18:42:30 zaphodc1-node01 lvm[5582]: snapper_thinp/POOL: thin-pool segment monitoring function failed.
    Dec 21 18:42:30 zaphodc1-node01 lvm[5582]: Creating volume group backup "/etc/lvm/backup/snapper_thinp" (seqno 5).
    Dec 21 18:42:31 zaphodc1-node01 lvm[5582]: Found volume group "snapper_thinp"
    Dec 21 18:42:31 zaphodc1-node01 lvm[5582]: activation/volume_list configuration setting not defined: Checking only host tags for snapper_thinp/origin
    Dec 21 18:42:31 zaphodc1-node01 dmeventd[5631]: dmeventd ready for processing.
    Dec 21 18:42:31 zaphodc1-node01 lvm[5582]: Monitoring snapper_thinp/POOL

Comment 2 Marian Csontos 2012-12-22 12:22:17 UTC

Though dmeventd was restarted, but there was no warning issued on data in pool over quota. This has not happened except in this case.

Comment 3 RHEL Program Management 2012-12-26 06:49:18 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 4 Marian Csontos 2013-01-17 11:23:30 UTC

Looks like this is "out of memory" issue: hypothesis is dmeventd while registering event is calling mlockall[1] which blocks(?) and system attempts to kill it on timeout[2].

    (gdb) info threads:

[1]     2 Thread 0x7f4784ca57a0 (LWP 5577)  0x00007f4783d64637 in mlockall () at ../sysdeps/unix/syscall-template.S:82
[2]     1 Thread 0x7f4784ca4700 (LWP 5578)  __pthread_kill (threadid=0, signo=14) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:43

There is a thread_status structure with NULL thread member registered for timeout. When handling timeout NULL pointer is dereferenced and dmeventd segfaults.

The situation: 
- there is no thread yet to deliver signal to
- dmeventd may block undefinitely

The question is if we could handle this better and how: Is it better to take down whole daemon or (eventually) block forever? No other options are coming to my mind.

Another issue is dmeventd has quite a large memory footprint in this case consuming over 100M including things like locales (I wonder is this locked in memory too?) and addressing this would at least reduce probability of running into the bug:

    7f477d048000-7f4782ed9000 r--p 00000000 fd:00 262660                     /usr/lib/locale/locale-archive

I have seen few anon hugepages allocated by another instance of dmeventd too and IIUC these are split and attempted to lock in memory too.

Comment 5 Marian Csontos 2013-01-17 13:54:46 UTC

I just checked with recent nightly lvm2 build (2.02.98-8.21) and locale-archive were loaded.

Comment 6 Zdenek Kabelac 2013-01-22 10:36:36 UTC

dmeventd was reseting LANG variable to C locales - however system used LC_ vars to override them - thus locale archive got loaded.

Fixed upstream with this patch, that avoids locking archive in memory.

https://www.redhat.com/archives/lvm-devel/2013-January/msg00036.html

Comment 7 Marian Csontos 2013-01-22 12:00:12 UTC

Any chance getting this into 6.4?

Impact of the change is close to 0. Sanity only testing required.

Impact of the issue:
- in the worst case losing data when snapshot is not monitored and resized or lost mirror leg replaced.
- significant memory waste on low RAM systems (VMs)
- and last causing failures in our testing environment increasing noise/signal ratio and making it possible to miss a real bugs

Also after dmeventd failure, any watched devices are not re-registered until touched manually (either explicit registration or calling lvconvert) making the error span long lasting increasing potential to hit the issue.

This may be an issue behind the Bug 889361 - I have seen the segfault in the issue always being accompanied by this issue suggesting it may be low memory too.

Comment 8 Peter Rajnoha 2013-01-22 13:02:23 UTC

The segfault would still occure in other low memory situations. The LANG/LC_ALL setting is just one of the originators of the low memory situation. But the segfault itself should be fixed separately (the problem mentioned in comment #4).

This would require more changes in the code - too late for 6.4 now. I'm proposing this change for 6.5 and opening bug #902806 for the LANG --> LC_ALL fix (which is still viable for 6.4).

Comment 9 Zdenek Kabelac 2013-01-22 14:21:55 UTC

Created attachment 685180 [details]
Proposal patch for locking changes

For some time my repo has this patch which needs review - which is changing some locking behavior of dmeventd.

Comment 14 Andy Grimm 2014-02-06 16:18:11 UTC

We have seen this issue on openshift nodes.  we may be able to test the patch.

Comment 16 Zdenek Kabelac 2015-10-15 10:19:55 UTC

So I've already started with quite major rework of some 'core' components of 'dmeventd'.

Now with better logging we will hopefully be able to capture bugs much better.

Comment 19 Zdenek Kabelac 2015-12-07 10:18:01 UTC

I'm more or less conviced it's been rather failing  dmeventd pthread processing code then actually low-memory condition issue.

This bug is goint to be marked as solved and when similar issue will appear with  release  2.02.133 or higher - please reopen new BZ with better traces.

Comment 21 Corey Marthaler 2016-03-04 21:48:08 UTC

Marking verified (SanityOnly) in the latest rpms. 

QA has not seen this issue in many iterations of this test.

2.6.32-621.el6.x86_64
lvm2-2.02.143-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
lvm2-libs-2.02.143-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
lvm2-cluster-2.02.143-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
udev-147-2.71.el6    BUILT: Wed Feb 10 07:07:17 CST 2016
device-mapper-1.02.117-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
device-mapper-libs-1.02.117-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
device-mapper-event-1.02.117-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
device-mapper-event-libs-1.02.117-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016
device-mapper-persistent-data-0.6.2-0.1.rc5.el6    BUILT: Wed Feb 24 07:07:09 CST 2016
cmirror-2.02.143-1.el6    BUILT: Wed Feb 24 07:59:50 CST 2016

Comment 23 errata-xmlrpc 2016-05-11 01:20:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0964.html

Comment 24 Red Hat Bugzilla 2023-09-14 01:39:43 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.