Created attachment 1302231 [details] valgrind output, lvmetad output, gdb full output Description of problem: lvmetad crashes with “memory smash” during lvm parallel operations like lvs/vgs/vgscan/lvscan commands ERROR LOG: 13985 global info flags none reason none token filter:3239235440 update_pid 0 -> response = "OK" -> global_invalid = 0 -> global_disable = 0 -> disable_reason = "none" -> daemon_pid = 13960 -> token = "filter:3239235440" -> update_cmd = "" -> update_pid = 0 -> update_begin = 0 -> update_timeout = 0 -> <- request="get_global_info" <- token="skip" <- pid=13989 <- cmd="lvs" 13989 global info flags none reason none token filter:3239235440 update_pid 0 <- request="get_global_info" <- token="skip" <- pid=13985 <- cmd="lvs" 13985 global info flags none reason none token filter:3239235440 update_pid 0 -> response = "OK" -> global_invalid = 0 -> global_disable = 0 -> disable_reason = "none" -> daemon_pid = 13960 -> token = "filter:3239235440" -> update_cmd = "" -> update_pid = 0 -> update_begin = 0 -> update_timeout = 0 -> lvmetad: mm/dbg_malloc.c:271: dm_bounds_check_debug: Assertion `!"Memory smash"' failed. lvmetad: mm/dbg_malloc.c:271: dm_bounds_check_debug: Assertion `!"Memory smash"' failed. Aborted (core dumped) (FULL Logs is attached) Version-Release number of selected component (if applicable): PKG_NAME=LVM2 PKG_VERSION=2.02.173 PKG_URL=LVM2.2.02.173.tgz PKG_MD5=61cba056ac552f2d362600d494b1b8d9 PKG_DATE=2017-07-20 # lvs --version LVM version: 2.02.173(2) (2017-07-20) Library version: 1.02.142 (2017-07-20) Driver version: 4.34.0 Configuration: ./configure --build=none --host=mips64-linux-gnu --prefix=/usr --libdir=/usr/lib64 --enable-udev_rules enable_lvmetad=yes --sbindir=/usr/bin --enable-debug ac_cv_func_malloc_0_nonnull=yes ac_cv_func_realloc_0_nonnull=yes This issue is also seen in 2.02.168, 2.02.171 How reproducible: Running lvs/lvscan/vgscan in background so, that the simultaneous query can be sent to the lvmetad . which will result in lvmetad crash # for i in `seq 1 100` > do > lvs & > done Steps to Reproduce: 1.run lvmetad -s /run/lvm/lvmetad.socket -f 2. run lvs/vgs/vgscan in background multiple times like mentioned above Actual results: lvmetad should not coredump, better handling is required for parallel threads From the gdb backtrace, it is clear that during dm_bounds_check_debug() the unprotected linkedlist of struct memoryblock is resulting in this core, when it try to traverse through linkedlist. ========== backtrace ========== (gdb) bt full #0 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:58 set = {__val = {0, 0, 1099373217776, 1099372078024, 0, 1099361920736, 89, 1099373170688, 1099373197016, 1099373170688, 1099361920600, 1099353538560, 733066163248, 733066163504, 15, 1099373217776}} err = <optimized out> pid = <optimized out> tid = <optimized out> ret = 0 #1 0x000000fff7a9ab2c in __GI_abort () at abort.c:89 save_stage = 2 act = {sa_flags = 0, __sigaction_handler = {sa_handler = 0xfff0001369, sa_sigaction = 0xfff0001369}, sa_mask = {__val = {1099243197300, 1099243197200, 1099243197300, 0, 0, 0, 0, 0, 1095216660480, 1099373217776, 1099373211096, 0, 1099373217776, 1099373217776, 18446744073709551615, 1099372989528}}, sa_restorer = 0xfff7cc20c8} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x000000fff7a9010c in __assert_fail_base (fmt=0xfff7bc8c60 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0xfff7cc20c8 "!\"Memory smash\"", file=0xfff7cc1f50 "mm/dbg_malloc.c", line=<optimized out>, function=<optimized out>) at assert.c:92 str = 0xfff0001310 "" total = 4096 #3 0x000000fff7a901dc in __GI___assert_fail (assertion=0xfff7cc20c8 "!\"Memory smash\"", file=0xfff7cc1f50 "mm/dbg_malloc.c", line=<optimized out>, function=0xfff7cc20e8 "dm_bounds_check_debug") at assert.c:101 No locals. #4 0x000000fff7ca6124 in dm_bounds_check_debug () from /usr/lib64/libdevmapper.so.1.02 No symbol table info available. #5 0x000000fff7ca6574 in dm_bounds_check_wrapper () from /usr/lib64/libdevmapper.so.1.02 No symbol table info available. #6 0x000000fff7ca5a94 in dm_free_aux () from /usr/lib64/libdevmapper.so.1.02 No symbol table info available. #7 0x000000fff7ca6458 in dm_free_wrapper () from /usr/lib64/libdevmapper.so.1.02 No symbol table info available. #8 0x000000aaae248ee4 in buffer_destroy () So, we need to introduce lock here for this data structure . In order to safely traverse the linkedlist struct memblock { struct memblock *prev, *next; /* All allocated blocks are linked */ size_t length; /* Size of the requested block */ int id; /* Index of the block */ const char *file; /* File that allocated */ int line; /* Line that allocated */ void *magic; /* Address of this block */ } __attribute__((aligned(8))); static struct memblock *_head = 0; static struct memblock *_tail = 0; This was confirmed from the valgrind output. Thread 11: Invalid read of size 1 at 0x48E80DC: dm_bounds_check_debug (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02) by 0x48E856C: dm_bounds_check_wrapper (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02) Address 0x4c2d55f is 79 bytes inside a block of size 80 free'd at 0x484E75C: free (in /mnt/sysimg/usr/lib64/valgrind/vgpreload_memcheck-mips64-linux.so) by 0x48E7D20: dm_free_aux (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02) Block was alloc'd at at 0x484CF60: malloc (in /mnt/sysimg/usr/lib64/valgrind/vgpreload_memcheck-mips64-linux.so) by 0x48E76E0: dm_malloc_aux_debug (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02) ==14434== Process terminating with default action of signal 6 (SIGABRT): dumping core ==14434== at 0x49BD04C: raise (raise.c:58) ==14434== by 0x49BEB28: abort (abort.c:89) ==14434== by 0x49B4108: __assert_fail_base (assert.c:92) ==14434== by 0x49B41D8: __assert_fail (assert.c:101) ==14434== by 0x48E8120: dm_bounds_check_debug (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02) ==14434== ==14434== HEAP SUMMARY: ==14434== in use at exit: 32,789 bytes in 69 blocks ==14434== total heap usage: 1,323 allocs, 1,254 frees, 1,240,256 bytes allocated Expected results: lvmetad should able to handle parallel threads. Additional info:
Please don't use --enable-debug It's designed for internal use - not for regular usage. Internal debugging has it's known limitation with thread usage.
Hi, Thank you for quick response. The debug is used only once we discovered this issue.So, this is reproducible without debug "--enable-debug" option also. With Regards, Gururaj S
Can you please attach backtrace of your crash for normal build of lvmetad ? Also as a simple workaround - disable use_lvmetad in your lvm.conf It's probably the best you can do ATM.
Hi, We are not able to reproduce this on LVM-2.02.173 package without --enable-debug. Anyway, just from the code point of view, we can have a lock introduced to this data structure. struct memblock { struct memblock *prev, *next; /* All allocated blocks are linked */ size_t length; /* Size of the requested block */ int id; /* Index of the block */ const char *file; /* File that allocated */ int line; /* Line that allocated */ void *magic; /* Address of this block */ } __attribute__((aligned(8))); static struct memblock *_head = 0; static struct memblock *_tail = 0; Please provide your suggestion. With Regards, Gururaj S
lvm2 developers are well aware of limitation of internal debugging code - which has been developed in previous millennium. The original purpose was to track memory usage from non-pthreaded programs. However then some 'threaded' tools/daemons were written. The reason why this code is no longer extended/developed is - there are simply way better tools available these days (be it valgrind, clang memory sanitizer...). But you are possibly right we may add more support for people who tend to use internal debugging support outside of lvm2 team - so it can be worth to make it more pthread aware.
Was this build actually compiling lvmetad but WITHOUT 'dmeventd' ? (--enabled-dmeventd was not passed on configure) There is actually a bug in Makefile passing 'DEBUG_MEM' when pthreaded program is compiled - which normally should be ONLY used for non-multithreaded programs.
Hi, Good to know this. I think it is good to idea have documentation on this. If it is available then great. Anyhow, we can close this thread if you do not have any further discussion on this With Regards, Gururaj S
Assuming this was addressed by commits: https://www.redhat.com/archives/lvm-devel/2017-August/msg00006.html https://www.redhat.com/archives/lvm-devel/2017-August/msg00010.html https://www.redhat.com/archives/lvm-devel/2017-August/msg00009.html to avoid compilation of oldish thread unsafe memory debugging code when pthread code is compiled-in.