Bug 1473596

Summary: lvmetad crashes with “memory smash” during lvm parallel operations
Product: [Community] LVM and device-mapper Reporter: Gururaj <gururaj.srk>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: lvmetad QA Contact: cluster-qe <cluster-qe>
Status: POST --- Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, gururaj.srk, heinzm, jbrassow, msnitzer, prajnoha, zkabelac
Version: 2.02.173Flags: rule-engine: lvm-technical-solution?
rule-engine: lvm-test-coverage?
Target Milestone: ---   
Target Release: ---   
Hardware: mips64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
valgrind output, lvmetad output, gdb full output none

Description Gururaj 2017-07-21 09:18:20 UTC
Created attachment 1302231 [details]
valgrind output, lvmetad output, gdb full output

Description of problem:
lvmetad crashes with “memory smash” during lvm parallel operations like lvs/vgs/vgscan/lvscan commands

ERROR LOG:
13985 global info flags none reason none token filter:3239235440 update_pid 0
-> response = "OK"
-> global_invalid = 0
-> global_disable = 0
-> disable_reason = "none"
-> daemon_pid = 13960
-> token = "filter:3239235440"
-> update_cmd = ""
-> update_pid = 0
-> update_begin = 0
-> update_timeout = 0
-> 
<- request="get_global_info"
<- token="skip"
<- pid=13989
<- cmd="lvs"
13989 global info flags none reason none token filter:3239235440 update_pid 0
<- request="get_global_info"
<- token="skip"
<- pid=13985
<- cmd="lvs"
13985 global info flags none reason none token filter:3239235440 update_pid 0
-> response = "OK"
-> global_invalid = 0
-> global_disable = 0
-> disable_reason = "none"
-> daemon_pid = 13960
-> token = "filter:3239235440"
-> update_cmd = ""
-> update_pid = 0
-> update_begin = 0
-> update_timeout = 0
-> 
lvmetad: mm/dbg_malloc.c:271: dm_bounds_check_debug: Assertion `!"Memory smash"' failed.
lvmetad: mm/dbg_malloc.c:271: dm_bounds_check_debug: Assertion `!"Memory smash"' failed.
Aborted (core dumped)
(FULL Logs is attached)


Version-Release number of selected component (if applicable):

PKG_NAME=LVM2
PKG_VERSION=2.02.173
PKG_URL=LVM2.2.02.173.tgz
PKG_MD5=61cba056ac552f2d362600d494b1b8d9
PKG_DATE=2017-07-20


# lvs --version
  LVM version:     2.02.173(2) (2017-07-20)
  Library version: 1.02.142 (2017-07-20)
  Driver version:  4.34.0
  Configuration:   ./configure --build=none --host=mips64-linux-gnu --prefix=/usr --libdir=/usr/lib64 --enable-udev_rules enable_lvmetad=yes --sbindir=/usr/bin --enable-debug ac_cv_func_malloc_0_nonnull=yes ac_cv_func_realloc_0_nonnull=yes

This issue is also seen in 2.02.168, 2.02.171

How reproducible:
Running lvs/lvscan/vgscan  in background so, that the  simultaneous query can be sent to the lvmetad . which will result in lvmetad crash

# for i in `seq 1 100`
> do
> lvs &
> done


Steps to Reproduce:
1.run lvmetad  -s  /run/lvm/lvmetad.socket -f

2. run lvs/vgs/vgscan in background multiple times like mentioned above


Actual results:

lvmetad should not coredump, better handling is required for parallel threads

From the gdb backtrace, it is clear that during dm_bounds_check_debug() the unprotected linkedlist of struct memoryblock is resulting in this core, when it try to traverse through linkedlist.

==========
backtrace
==========
(gdb) bt full
#0  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:58
        set = {__val = {0, 0, 1099373217776, 1099372078024, 0, 1099361920736, 89, 1099373170688, 1099373197016, 1099373170688, 1099361920600, 1099353538560, 733066163248, 
            733066163504, 15, 1099373217776}}
        err = <optimized out>
        pid = <optimized out>
        tid = <optimized out>
        ret = 0
#1  0x000000fff7a9ab2c in __GI_abort () at abort.c:89
        save_stage = 2
        act = {sa_flags = 0, __sigaction_handler = {sa_handler = 0xfff0001369, sa_sigaction = 0xfff0001369}, sa_mask = {__val = {1099243197300, 1099243197200, 1099243197300, 
              0, 0, 0, 0, 0, 1095216660480, 1099373217776, 1099373211096, 0, 1099373217776, 1099373217776, 18446744073709551615, 1099372989528}}, sa_restorer = 0xfff7cc20c8}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x000000fff7a9010c in __assert_fail_base (fmt=0xfff7bc8c60 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0xfff7cc20c8 "!\"Memory smash\"", 
    file=0xfff7cc1f50 "mm/dbg_malloc.c", line=<optimized out>, function=<optimized out>) at assert.c:92
        str = 0xfff0001310 ""
        total = 4096
#3  0x000000fff7a901dc in __GI___assert_fail (assertion=0xfff7cc20c8 "!\"Memory smash\"", file=0xfff7cc1f50 "mm/dbg_malloc.c", line=<optimized out>, 
    function=0xfff7cc20e8 "dm_bounds_check_debug") at assert.c:101
No locals.
#4  0x000000fff7ca6124 in dm_bounds_check_debug () from /usr/lib64/libdevmapper.so.1.02
No symbol table info available.
#5  0x000000fff7ca6574 in dm_bounds_check_wrapper () from /usr/lib64/libdevmapper.so.1.02
No symbol table info available.
#6  0x000000fff7ca5a94 in dm_free_aux () from /usr/lib64/libdevmapper.so.1.02
No symbol table info available.
#7  0x000000fff7ca6458 in dm_free_wrapper () from /usr/lib64/libdevmapper.so.1.02
No symbol table info available.
#8  0x000000aaae248ee4 in buffer_destroy ()


So, we need to introduce lock here for this data structure . In order to safely traverse the linkedlist 

struct memblock {
        struct memblock *prev, *next;   /* All allocated blocks are linked */
        size_t length;          /* Size of the requested block */
        int id;                 /* Index of the block */
        const char *file;       /* File that allocated */
        int line;               /* Line that allocated */
        void *magic;            /* Address of this block */
} __attribute__((aligned(8)));

static struct memblock *_head = 0;
static struct memblock *_tail = 0;

This was confirmed from the valgrind output. 

Thread 11:
Invalid read of size 1
   at 0x48E80DC: dm_bounds_check_debug (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02)
   by 0x48E856C: dm_bounds_check_wrapper (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02)
 Address 0x4c2d55f is 79 bytes inside a block of size 80 free'd
   at 0x484E75C: free (in /mnt/sysimg/usr/lib64/valgrind/vgpreload_memcheck-mips64-linux.so)
   by 0x48E7D20: dm_free_aux (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02)
 Block was alloc'd at
   at 0x484CF60: malloc (in /mnt/sysimg/usr/lib64/valgrind/vgpreload_memcheck-mips64-linux.so)
   by 0x48E76E0: dm_malloc_aux_debug (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02)

==14434== Process terminating with default action of signal 6 (SIGABRT): dumping core
==14434==    at 0x49BD04C: raise (raise.c:58)
==14434==    by 0x49BEB28: abort (abort.c:89)
==14434==    by 0x49B4108: __assert_fail_base (assert.c:92)
==14434==    by 0x49B41D8: __assert_fail (assert.c:101)
==14434==    by 0x48E8120: dm_bounds_check_debug (in /mnt/sysimg/usr/lib64/libdevmapper.so.1.02)
==14434==
==14434== HEAP SUMMARY:
==14434==     in use at exit: 32,789 bytes in 69 blocks
==14434==   total heap usage: 1,323 allocs, 1,254 frees, 1,240,256 bytes allocated



Expected results:
lvmetad should able to handle parallel threads.

Additional info:

Comment 1 Zdenek Kabelac 2017-07-21 09:30:13 UTC
Please don't use   --enable-debug  

It's designed for internal use - not for regular usage.

Internal debugging has it's known limitation with thread usage.

Comment 2 Gururaj 2017-07-21 09:33:45 UTC
Hi,

Thank you for quick response.

The debug is used only once we discovered this issue.So, this is reproducible without debug "--enable-debug" option  also.

With Regards,
Gururaj S

Comment 3 Gururaj 2017-07-21 09:34:30 UTC
Hi,

Thank you for quick response.

The debug is used only once we discovered this issue.So, this is reproducible without debug "--enable-debug" option  also.

With Regards,
Gururaj S

Comment 4 Zdenek Kabelac 2017-07-21 09:48:25 UTC
Can you please attach backtrace of your crash for normal build of lvmetad ?

Also as a simple workaround - disable use_lvmetad  in your lvm.conf
It's probably the best you can do ATM.

Comment 5 Gururaj 2017-07-21 10:40:39 UTC
Hi, 

We are not able to reproduce this on LVM-2.02.173 package without --enable-debug.

Anyway, just from the code point of view, we can have a lock introduced to this data structure.

struct memblock {
        struct memblock *prev, *next;   /* All allocated blocks are linked */
        size_t length;          /* Size of the requested block */
        int id;                 /* Index of the block */
        const char *file;       /* File that allocated */
        int line;               /* Line that allocated */
        void *magic;            /* Address of this block */
} __attribute__((aligned(8)));

static struct memblock *_head = 0;
static struct memblock *_tail = 0;

Please provide your suggestion.

With Regards,
Gururaj S

Comment 6 Zdenek Kabelac 2017-07-25 09:19:31 UTC
lvm2 developers are well aware of limitation of internal debugging code - which has been developed in previous millennium.

The original purpose was to track memory usage from  non-pthreaded programs.
However then some 'threaded' tools/daemons were written.

The reason why this code is no longer extended/developed is - there are simply way better tools available these days (be it valgrind, clang memory sanitizer...).

But you are possibly right we may add more support for people who tend to use internal debugging support outside of lvm2 team - so it can be worth to make it more pthread aware.

Comment 7 Zdenek Kabelac 2017-07-28 15:19:01 UTC
Was this build  actually compiling lvmetad  but WITHOUT 'dmeventd' ?
(--enabled-dmeventd was not passed on configure)

There is actually a bug in Makefile passing  'DEBUG_MEM' when pthreaded program is compiled - which normally should be ONLY used for non-multithreaded programs.

Comment 8 Gururaj 2017-07-31 07:16:46 UTC
Hi,

Good to know this.

I think it is good to idea have documentation on this. If it is available then great.

Anyhow, we can close this thread if you do not have any further discussion on this

With Regards,
Gururaj S

Comment 9 Zdenek Kabelac 2017-08-22 13:15:08 UTC
Assuming this was addressed by commits:


https://www.redhat.com/archives/lvm-devel/2017-August/msg00006.html
https://www.redhat.com/archives/lvm-devel/2017-August/msg00010.html
https://www.redhat.com/archives/lvm-devel/2017-August/msg00009.html


to avoid compilation of oldish thread unsafe memory debugging code when pthread code is compiled-in.