Description of problem: As, I've been attempting to reproduce 199433, I've been seeing the systems panic due to a loss of memory after my tests run for 24 - 72 hours. My script basically just deactivates and reactivates a multitude of different volumes (linears, stripes, mirrors) on 4 different VGs. I've noticed through top the size of clvmd grow. Chris set up valgrind for me and we saw the follow messages on memory: ==20161== Thread 1: ==20161== ==20161== 33 bytes in 1 blocks are possibly lost in loss record 12 of 46 ==20161== at 0x4904A06: malloc (vg_replace_malloc.c:149) ==20161== by 0x36D0E70581: strdup (in /lib64/tls/libc-2.3.4.so) ==20161== by 0x38E7302849: (within /lib64/libdevmapper-event.so.1.02) ==20161== by 0x38E730298F: dm_event_get_registered_device (in /lib64/libdevmapper-event.so.1.02) ==20161== by 0x44B2BE: _target_monitored (mirrored.c:435) ==20161== by 0x417FB9: monitor_dev_for_events (activate.c:714) ==20161== by 0x4188D6: _lv_activate (activate.c:947) ==20161== by 0x41892F: lv_activate (activate.c:956) ==20161== by 0x4116CA: do_activate_lv (lvm-functions.c:232) ==20161== by 0x4119B8: do_lock_lv (lvm-functions.c:346) ==20161== by 0x40D2FD: do_command (clvmd-command.c:117) ==20161== by 0x40FE12: process_remote_command (clvmd.c:1258) ==20161== ==20161== ==20161== 816 bytes in 3 blocks are possibly lost in loss record 37 of 46 ==20161== at 0x4905D27: calloc (vg_replace_malloc.c:279) ==20161== by 0x36D0C0D5B2: _dl_allocate_tls (in /lib64/ld-2.3.4.so) ==20161== by 0x36D1B06786: pthread_create@@GLIBC_2.2.5 (in /lib64/tls/libpthread-2.3.4.so) ==20161== by 0x45B68C: dlm_ls_pthread_init (in /usr/src/redhat/BUILD/LVM2.2.02.21/daemons/clvmd/clvmd) ==20161== by 0x415D13: _init_cluster (clvmd-cman.c:102) ==20161== by 0x416979: init_cman_cluster (clvmd-cman.c:506) ==20161== by 0x40DD12: main (clvmd.c:279) ==20161== ==20161== ==20161== 22,215 bytes in 990 blocks are definitely lost in loss record 44 of 46 ==20161== at 0x4904A06: malloc (vg_replace_malloc.c:149) ==20161== by 0x38E7301B6E: (within /lib64/libdevmapper-event.so.1.02) ==20161== by 0x38E7301F19: (within /lib64/libdevmapper-event.so.1.02) ==20161== by 0x38E7302355: (within /lib64/libdevmapper-event.so.1.02) ==20161== by 0x38E730290C: dm_event_get_registered_device (in /lib64/libdevmapper-event.so.1.02) ==20161== by 0x44B2BE: _target_monitored (mirrored.c:435) ==20161== by 0x417DA9: monitor_dev_for_events (activate.c:676) ==20161== by 0x41862D: lv_deactivate (activate.c:874) ==20161== by 0x411826: do_deactivate_lv (lvm-functions.c:293) ==20161== by 0x4119CA: do_lock_lv (lvm-functions.c:350) ==20161== by 0x40D2FD: do_command (clvmd-command.c:117) ==20161== by 0x41048E: process_local_command (clvmd.c:1481) ==20161== ==20161== ==20161== 169,137 bytes in 2,639 blocks are definitely lost in loss record 46 of 46 ==20161== at 0x4904A06: malloc (vg_replace_malloc.c:149) ==20161== by 0x36D0E70581: strdup (in /lib64/tls/libc-2.3.4.so) ==20161== by 0x38E7108DE0: dm_asprintf (in /lib64/libdevmapper.so.1.02) ==20161== by 0x38E7301EB4: (within /lib64/libdevmapper-event.so.1.02) ==20161== by 0x38E73024F6: (within /lib64/libdevmapper-event.so.1.02) ==20161== by 0x38E730290C: dm_event_get_registered_device (in /lib64/libdevmapper-event.so.1.02) ==20161== by 0x44B2BE: _target_monitored (mirrored.c:435) ==20161== by 0x417FB9: monitor_dev_for_events (activate.c:714) ==20161== by 0x41862D: lv_deactivate (activate.c:874) ==20161== by 0x411826: do_deactivate_lv (lvm-functions.c:293) ==20161== by 0x4119CA: do_lock_lv (lvm-functions.c:350) ==20161== by 0x40D2FD: do_command (clvmd-command.c:117) ==20161== ==20161== LEAK SUMMARY: ==20161== definitely lost: 191,352 bytes in 3,629 blocks. ==20161== possibly lost: 849 bytes in 4 blocks. ==20161== still reachable: 167,954 bytes in 406 blocks. ==20161== suppressed: 0 bytes in 0 blocks. ==20161== Reachable blocks (those to which a pointer was found) are not shown. ==20161== To see them, rerun with: --show-reachable=yes Version-Release number of selected component (if applicable): lvm2-cluster-2.02.21-3.el4 How reproducible: quite a few times now
This is currently blocking the release of the cluster mirror and cluster products for 4.5, so setting all of the flags.
Created attachment 152914 [details] libdevmapper-event-leak-fix.patch Attaching a proposed patch that should fix the dmeventd-related leaks seen here (should apply cleanly against current CVS of device-mapper and current iteration of 4.5 packages). The approximate overall leak is 500 bytes per activate/deactivate cycle per mirrored logical volume, which is not that severe for systems where activation/deactivation is rare (probably all but test machines fall into this category).
Setting a bunch of flags. This defect is blocking making progress on cluster mirror testing and will need to be fixed for cluster mirrors to be usable for customers. At a minimum, we will need to do a z-stream or async errata release and the cluster mirror packages will be dependent on this version of the libraries.
Created attachment 152919 [details] libdevmapper-event-leak-fix.patch Re-diffed with -p to get function names.
Fix verified in device-mapper-1.02.17-3.0.1.el4.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0284.html