Bug 730289

Summary:	clvmd not working properly if multiple clients talk to clvmd (part1)
Product:	Red Hat Enterprise Linux 6	Reporter:	Milan Broz <mbroz>
Component:	lvm2	Assignee:	Zdenek Kabelac <zkabelac>
Status:	CLOSED ERRATA	QA Contact:	Corey Marthaler <cmarthal>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	6.2	CC:	agk, cmarthal, coughlan, djansa, dwysocha, heinzm, jbrassow, nperic, prajnoha, prockai, pvrabec, thornber, zkabelac
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	lvm2-2.02.98-4.el6	Doc Type:	Bug Fix
Doc Text:	Clvmd has been consuming a lot of memory resource to process every single request. Each request in current architecture invokes a thread, and by default such thread is allocating ~9MB of RAM for stack. Since clmvd uses mlockall to keep all pages in memory, 200 parallel request easily grown running binary size to 2GB leading either to OOM killer or very slow system. Fix has reduced the default thread's stack size to 128KB which is enough for current version of lvm to handle all tasks. This leads to massive reduction of memory used runtime by clvmd.	Story Points:	---
Clone Of:
Clones:	883842 (view as bug list)		Environment:
Last Closed:	2013-02-21 08:03:11 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	756082

Description Milan Broz 2011-08-12 11:30:01 UTC

Description of problem:
This is following bugs #697945 and #698393.

clvmd uses threads and it seems that it very badly interfere with lvm internal memory pre-allocation and locking.

simple script like

 for j in $(seq 1 200); 
  do 
        lvs vg_test &
  done; 

cause OOM on 2GB RAM system (even if using singlenode, this is not problem of cluster backend).

With disabled preallocation it is slightly better, with disabled lvm memory locking the use of memory is normal.


Version-Release number of selected component (if applicable):
lvm2-2.02.87-1.el6 (but the problem is present in all version, just 87 contains
memory corruption fix, seee bugs above).

Comment 2 Zdenek Kabelac 2011-08-12 20:47:27 UTC

I'll take a closer look on this one - but I think there were no memory optimisation for multi-threaded code yet - as that's beyond the capability of simple memory maps locking thus clvmd basically works in 'mlockall()' mode with LANG=C.  Though not really sure how large lvm locks are - if you would try to run 200 lvm commands at the same time you would probably exhaust 2GB machine as well - i.e.  1 lvm command ~30MB  -> 6GB of memory would need to be mapped at the same time. So there needs to be some max limit of allowed clvmd threads completing same command.

Comment 4 Zdenek Kabelac 2011-10-11 10:21:02 UTC

Patch to address memory stack thread memory consumption

Reduce preallocated stack size to 64KiB:
https://www.redhat.com/archives/lvm-devel/2011-October/msg00070.html

128KiB for thread stack size.
https://www.redhat.com/archives/lvm-devel/2011-October/msg00072.html

Reduces memory usage for overloaded clvmd.

Comment 5 Zdenek Kabelac 2011-10-11 10:29:27 UTC

It's also forth to note without this patch:

https://www.redhat.com/archives/lvm-devel/2011-October/msg00069.html

Clvmd coredumps when it gets out of free descriptor during refresh filter operation, so the patch is rather mandatory here.

Comment 6 Zdenek Kabelac 2011-10-11 11:19:18 UTC

*** Bug 698393 has been marked as a duplicate of this bug. ***

Comment 9 Zdenek Kabelac 2012-04-24 13:14:31 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Clvmd has been consuming a lot of memory resource to process every single request. Each request in current architecture invokes a thread, and by default such thread is allocating ~9MB of RAM for stack. Since clmvd uses mlockall to keep all pages in memory, 200 parallel request easily grown running binary size to 2GB leading either to OOM killer or very slow system.

Fix has reduced the default thread's stack size to 128KB which is enough for current version of lvm to handle all tasks. This leads to massive reduction of memory used runtime by clvmd.

Comment 10 Nenad Peric 2012-04-25 09:22:45 UTC

Tested with:

lvm2-2.02.95-5.el6.x86_64
lvm2-cluster-2.02.95-5.el6.x86_64

cmirror-2.02.95-5.el6.x86_64

device-mapper-1.02.74-5.el6.x86_64
device-mapper-persistent-data-0.1.4-1.el6.x86_64
device-mapper-libs-1.02.74-5.el6.x86_64
device-mapper-event-libs-1.02.74-5.el6.x86_64
device-mapper-event-1.02.74-5.el6.x86_64

kernel-2.6.32-265.el6.x86_64




There is no OOM issue and the commands are working normally up to 160 iterations of the below loop. 
But the lvm locks up if the number of iterations is 170 (or more) for two commands at the same time:


for i in {1..170}; do 
lvs &
vgs &
done

(11:00:42) [root@node01:~]$ jobs | grep "Running" | wc -l
340


The jobs kept sleeping in the background even after 5 minutes. 

None of the jobs finished and lvm commands did not return anything. 
The subsequent lvm commands just hang.

Comment 11 Peter Rajnoha 2012-04-25 12:04:54 UTC

(In reply to comment #10)
> The jobs kept sleeping in the background even after 5 minutes. 
> 
> None of the jobs finished and lvm commands did not return anything. 
> The subsequent lvm commands just hang.

Please, try installing the lvm2-debuginfo and then try to use gdb to hook into the process that hung to grab the backtrace...

How many LVs/VGs did you use in your test?

Comment 12 Nenad Peric 2012-04-25 12:09:12 UTC

(07:20:23) [root@node01:~]$ lvs
  LV       VG       Attr     LSize    Pool Origin Data%  Move Log           Copy%  Convert
  lv_root  VolGroup -wi-ao--    8.52g                                                     
  lv_swap  VolGroup -wi-ao-- 1008.00m                                                     
  lvol0    my_vg    -wi-a---  252.00m                                                     
  mirrored my_vg    mwi-a-m-  500.00m                         mirrored_mlog 100.00        
  striped  my_vg    -wi-a---  504.00m                                                     


(07:20:30) [root@node01:~]$ vgs
  VG       #PV #LV #SN Attr   VSize VFree
  VolGroup   1   2   0 wz--n- 9.51g    0 
  my_vg      4   3   0 wz--nc 5.70g 3.98g


(07:20:33) [root@node01:~]$ pvs
  PV         VG       Fmt  Attr PSize   PFree  
  /dev/sdb1  my_vg    lvm2 a--    1.91g   1.66g
  /dev/sdb2  my_vg    lvm2 a--    1.91g   1.90g
  /dev/sde1           lvm2 a--    1.91g   1.91g
  /dev/sdf1           lvm2 a--    1.91g   1.91g
  /dev/sdh1  my_vg    lvm2 a--  968.00m 216.00m
  /dev/sdh2  my_vg    lvm2 a--  968.00m 216.00m
  /dev/vda2  VolGroup lvm2 a--    9.51g      0

Comment 13 Nenad Peric 2012-04-25 12:20:27 UTC

Well I am not really sure which one is the first to get stuck, since none of the bg processes actually return anything.

So here is a backtrace of the first process:


#0  0x00007f08c8235360 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x000000000048ebb3 in read (inbuf=0x7fff81ed4260 "-", inlen=<value optimized out>, 
    retbuf=0x7fff81ed42b0) at /usr/include/bits/unistd.h:45
#2  _send_request (inbuf=0x7fff81ed4260 "-", inlen=<value optimized out>, 
    retbuf=0x7fff81ed42b0) at locking/cluster_locking.c:119
#3  0x000000000048f094 in _cluster_request (clvmd_cmd=<value optimized out>, 
    node=0x4c922b "", data=0x7fff81ed4310, len=14, response=0x7fff81ed4358, 
    num=0x7fff81ed4364) at locking/cluster_locking.c:223
#4  0x000000000048f60b in _lock_for_cluster (cmd=0x1ffd110, clvmd_cmd=45 '-', 
    flags=<value optimized out>, name=<value optimized out>)
    at locking/cluster_locking.c:376
#5  0x000000000048fb66 in _lock_resource (cmd=0x1ffd110, 
    resource=<value optimized out>, flags=256) at locking/cluster_locking.c:424
#6  0x000000000045c054 in _lock_vol (cmd=0x1ffd110, resource=<value optimized out>, 
    flags=256, lv_op=LV_NOOP) at locking/locking.c:387
#7  0x000000000045cbce in lock_vol (cmd=0x1ffd110, vol=0x4b97c4 "#sync_names", 
    flags=256) at locking/locking.c:465
#8  0x000000000047f91d in free_cmd_vgs (cmd_vgs=0x7fff81ed5780)
    at metadata/replicator_manip.c:594
#9  0x000000000042f800 in process_each_lv (cmd=<value optimized out>, argc=1, 
    argv=<value optimized out>, flags=0, handle=0x2047280, 
    process_single_lv=0x42cbb0 <_lvs_single>) at toollib.c:392
#10 0x000000000042c43b in _report (cmd=0x1ffd110, argc=0, argv=0x7fff81ed5b90, 
    report_type=LVS) at reporter.c:421
#11 0x000000000041fd52 in lvm_run_command (cmd=0x1ffd110, argc=0, argv=0x7fff81ed5b90)
    at lvmcmdline.c:1099
#12 0x0000000000422712 in lvm2_main (argc=1, argv=0x7fff81ed5b88) at lvmcmdline.c:1468
#13 0x00007f08c8179cdd in __libc_start_main (main=0x438380 <main>, argc=1, 
    ubp_av=0x7fff81ed5b88, init=<value optimized out>, fini=<value optimized out>, 
    rtld_fini=<value optimized out>, stack_end=0x7fff81ed5b78) at libc-start.c:226



If you need more backtrace lines let me know, I can put it onto some pastebin.

Comment 14 Nenad Peric 2012-04-25 12:24:17 UTC

duh..
There was just additional line which did not fit to my screen :)

#14 0x0000000000413c79 in _start ()

Comment 15 Peter Rajnoha 2012-04-25 13:15:12 UTC

...so waiting for an answer from clvmd. Could you do the same backtrace + debug from clvmd, seems like it got overloaded, but would be fine to know where exactly in the code (I guess the clvmd debug log will show us).

Comment 16 Peter Rajnoha 2012-04-25 14:47:26 UTC

(...just for the record, attaching the clvmd backtrace I got from Nenad)

#0  0x00007f95f8be754d in read () at ../sysdeps/unix/syscall-template.S:82
#1  0x000000000040b64b in read (thisfd=0x14da8f0, buf=<value optimized out>,
    maxlen=<value optimized out>, csid=<value optimized out>,
    new_client=<value optimized out>) at /usr/include/bits/unistd.h:45
#2  local_pipe_callback (thisfd=0x14da8f0, buf=<value optimized out>,
    maxlen=<value optimized out>, csid=<value optimized out>,
    new_client=<value optimized out>) at clvmd.c:710
#3  0x000000000040a59f in main_loop (local_sock=<value optimized out>, cmd_timeout=60)
    at clvmd.c:900
#4  0x000000000040c3b7 in main (argc=<value optimized out>, argv=<value optimized out>)
    at clvmd.c:608


That's the first "read" in clvmd.c/local_pipe_callback fn...

Comment 17 Peter Rajnoha 2012-04-25 14:50:14 UTC

Using clvmd debug log ended up with no hang, just the error message issued:

  cluster request failed: Device or resource busy
  Can't get lock for my_vg
  Skipping volume group my_vg

(which is OK if the clvmd is under pressure, there was an exact patch for this to issue such error code iirc)

Comment 18 Zdenek Kabelac 2012-04-25 15:06:56 UTC

Could you please attach full backtrace of all clvmd threads ?


(in gdb   -   "thread apply all bt full")

Comment 19 Nenad Peric 2012-04-25 16:13:02 UTC

Here's a pastebin which should sit there for a month:

http://pastebin.test.redhat.com/86662

There are a lot of threads so I pasted just a few of them.

Here's the list of threads when gdb is started (just a list):

http://pastebin.test.redhat.com/86663

As for threads themselves:

(gdb) info threads
  343 Thread 0x7f535501b700 (LWP 2611)  0x00007f5355a1e54d in read ()
    at ../sysdeps/unix/syscall-template.S:82
  342 Thread 0x7f5356ca1700 (LWP 2612)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  341 Thread 0x7f5356b9f700 (LWP 2916)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
.
.
.
.
  3 Thread 0x7f5335995700 (LWP 3376)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2 Thread 0x7f5335974700 (LWP 3377)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
* 1 Thread 0x7f5356ca27a0 (LWP 2597)  0x00007f5355a1e54d in read ()
    at ../sysdeps/unix/syscall-template.S:82


Info about the process which takes 1st thread (2597)

process 2597
cmdline = 'clvmd'
cwd = '/'
exe = '/usr/sbin/clvmd'
Process: 2597
Exec file: clvmd
State: T
Parent process: 1
Process group: 2597
Session id: 2597
TTY: 0
TTY owner process group: -1
Flags: 0x402140
Minor faults (no memory page): 23700
Minor faults, children: 1670
Major faults (memory page faults): 2
Major faults, children: 1
utime: 29
stime: 51
utime, children: 0
stime, children: 2
jiffies remaining in current time slice: 20
'nice' value: 0
jiffies until next timeout: 343
jiffies until next SIGALRM: 0
start time (jiffies since system boot): 15789
Virtual memory size: 637149184
Resident set size: 20459
rlim: 18446744073709551615
Start of text: 0x400000
End of text: 0x49f084
Start of stack: 0x7fffc0f65200

and process of thread 343:

(gdb) info proc 2611 stat
process 2611
cmdline = 'clvmd'
cwd = '/'
exe = '/usr/sbin/clvmd'
Process: 2611
Exec file: clvmd
State: T
Parent process: 1
Process group: 2597
Session id: 2597
TTY: 0
TTY owner process group: -1
Flags: 0x402040
Minor faults (no memory page): 23700
Minor faults, children: 1670
Major faults (memory page faults): 2
Major faults, children: 1
utime: 29
stime: 51
utime, children: 0
stime, children: 2
jiffies remaining in current time slice: 20
'nice' value: 0
jiffies until next timeout: 343
jiffies until next SIGALRM: 0
start time (jiffies since system boot): 16103
Virtual memory size: 637149184
Resident set size: 20459
rlim: 18446744073709551615
Start of text: 0x400000
End of text: 0x49f084
Start of stack: 0x7fffc0f65200

Comment 20 Peter Rajnoha 2012-04-26 09:17:32 UTC

(In reply to comment #19)
> Here's a pastebin which should sit there for a month:
> 
> http://pastebin.test.redhat.com/86662

This one's interesting, leading to libdlm and probably waiting for a lock (the other threads are in pthread_cond_wait):

Thread 343 (Thread 0x7f535501b700 (LWP 2611)):
#0  0x00007f5355a1e54d in read () at ../sysdeps/unix/syscall-template.S:82
No locals.
#1  0x00007f5355c2ede5 in ?? () from /usr/lib64/libdlm.so.3
No symbol table info available.
#2  0x00007f5355c2eeaf in ?? () from /usr/lib64/libdlm.so.3
No symbol table info available.
#3  0x00007f5355a17851 in start_thread (arg=0x7f535501b700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7f535501b700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139995885188864,
                6297545747566397011, 139995895697920, 139995885189568, 0, 3,
                -6213900416108231085, -6213899588721998253}, mask_was_saved = 0}},
          priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0,
              canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
#4  0x00007f535576567d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 21 Peter Rajnoha 2012-04-26 09:47:39 UTC

(In reply to comment #19)
> Here's a pastebin which should sit there for a month:
> 
> http://pastebin.test.redhat.com/86662
...
> As for threads themselves:
> .
> .
> .
> .
>   3 Thread 0x7f5335995700 (LWP 3376)  pthread_cond_wait@@GLIBC_2.3.2 ()
>     at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>   2 Thread 0x7f5335974700 (LWP 3377)  pthread_cond_wait@@GLIBC_2.3.2 ()
>     at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> * 1 Thread 0x7f5356ca27a0 (LWP 2597)  0x00007f5355a1e54d in read ()
>     at ../sysdeps/unix/syscall-template.S:82

Nenad, could you please do a full bt of "thread 1" (the one which is also in "read" call). This one is not in the bt log in the pastebin...

Comment 22 Peter Rajnoha 2012-04-26 09:55:58 UTC

(In reply to comment #21)
> > * 1 Thread 0x7f5356ca27a0 (LWP 2597)  0x00007f5355a1e54d in read ()
> >     at ../sysdeps/unix/syscall-template.S:82
> 
> Nenad, could you please do a full bt of "thread 1" (the one which is also in
> "read" call). This one is not in the bt log in the pastebin...

(when I'm trying this on my machine, I'm also getting the "read" in the last thread seems to be, after installing cluster debuginfo:

#0  0x000000381e00e4ed in read () from /lib64/libpthread.so.0
#1  0x0000003820801de5 in read (fd=<value optimized out>) at /usr/include/bits/unistd.h:45
#2  do_dlm_dispatch_v6 (fd=<value optimized out>) at /usr/src/debug/cluster-3.0.12/dlm/libdlm/libdlm.c:529
#3  0x0000003820801eaf in dlm_recv_thread (lsinfo=0x18aa190) at /usr/src/debug/cluster-3.0.12/dlm/libdlm/libdlm.c:1180
#4  0x000000381e0077f1 in start_thread () from /lib64/libpthread.so.0
#5  0x000000381d4e5ccd in clone () from /lib64/libc.so.6

So that might be OK. But the "thread 1" is the clvmd's main_loop call and it shouldn't be in the "read" unless it's receiving any data from a client (which shouldn't happen as the client itself is waiting on clvmd :)). So let's see what's the backtrace of your "thread 1" with the read call.

Comment 23 Peter Rajnoha 2012-04-26 10:05:59 UTC

(In reply to comment #22)
> So let's see what's the backtrace of your "thread 1" with the read call.

(ehm, well, that would be comment #16)

Comment 24 Nenad Peric 2012-04-26 11:23:56 UTC

This is not the same run (I had to restart the cluster to do some other tests) but here is the info of thread 1 now:


Thread 1 (Thread 0x7f6d864a87a0 (LWP 2476)):
#0  0x00007f6d8522454d in read () at ../sysdeps/unix/syscall-template.S:82
No locals.
#1  0x000000000040b64b in read (thisfd=0x1c04bf0, buf=<value optimized out>, 
    maxlen=<value optimized out>, csid=<value optimized out>, 
    new_client=<value optimized out>) at /usr/include/bits/unistd.h:45
No locals.
#2  local_pipe_callback (thisfd=0x1c04bf0, buf=<value optimized out>, 
    maxlen=<value optimized out>, csid=<value optimized out>, 
    new_client=<value optimized out>) at clvmd.c:710
        len = <value optimized out>
        buffer = "\000\000\004#sync_names\000\000\000\000\000\000\002\000\000\000\v\000\000\000\001\000I\026\000\000\000\005\000\000\000\000\000\000\000\000\000W\004#sync_names\000JOVGg0JH7yLSiYe6QOO3B4jIx85PyakBH0fYwGXQMqVTwSWpbhr0\000M\030,\r\000\000\000\000\355\060\231O\000\000\000\000M\030,\r\000\000\000\000\355\060\231O\000\000\000\000M\030,\r", '\000' <repeats 36 times>, "P\352\272\001\000\000\000\000\020J\323\\\377\177\000\000\254l\357\204m\177\000\000P\352\272\001\000\000\000\000\350Q\357\204m\177\000\000\000\000\000\000\000\000\000\000P\352\272\001\000\000\000\000\020J\323\\\377\177"...
        sock_client = 0x1be55d0
        status = -1
#3  0x000000000040a59f in main_loop (local_sock=<value optimized out>, cmd_timeout=60)
    at clvmd.c:900
        newfd = 0x0
        ret = <value optimized out>
        lastfd = <value optimized out>
        csid = "\000\000\000"
        select_status = <value optimized out>
        thisfd = <value optimized out>
        tv = {tv_sec = 59, tv_usec = 999872}
        quorate = <value optimized out>
        in = {fds_bits = {580401401977512192, 2486556012189384704, 
            4803089002626285568, 1120986464257, 1788211223540533416, 
            -3431742916056317951, -1936173725232639768, -6734567919800179253, 
            -8301811925027477502, 887230, 2147463168, 0, 0, 2494994193563254784, 
            -6329773539414375774, 2863311530}}
        ss = {__val = {16426, 0 <repeats 15 times>}}
#4  0x000000000040c3b7 in main (argc=<value optimized out>, argv=<value optimized out>)
    at clvmd.c:608
        local_sock = 6
        newfd = <value optimized out>
        delfd = <value optimized out>
        lvm_params = {excl_uuid = 0x1074030}
        opt = <value optimized out>
        cmd_timeout = 60
        start_timeout = 30
        cluster_iface = <value optimized out>
        ss = {__val = {18434, 0 <repeats 15 times>}}
        debug_opt = DEBUG_OFF
        debug_arg = DEBUG_OFF
        clusterwide_opt = 0
        old_mask = <value optimized out>
        ret = 1
        longopts = {{name = 0x475f23 "help", has_arg = 0, flag = 0x0, val = 104}, {
            name = 0x0, has_arg = 0, flag = 0x0, val = 0}}




and the last thread in this case is 342:


Thread 342 (Thread 0x7f6d84821700 (LWP 2490)):
#0  0x00007f6d8522454d in read () at ../sysdeps/unix/syscall-template.S:82
No locals.
#1  0x00007f6d85434de5 in read (fd=<value optimized out>)
    at /usr/include/bits/unistd.h:45
No locals.
#2  do_dlm_dispatch_v6 (fd=<value optimized out>)
    at /usr/src/debug/cluster-3.0.12.1/dlm/libdlm/libdlm.c:529
        resultbuf = "\006\000\000\000\000\000\000\000\001\000\000\000H\000\000\000\340\371@\000\000\000\000\000p,\r\203m\177\000\000\310,\r\203m\177\000\000\376\377\376\377\001\000\177\002", '\000' <repeats 55 times>
        result = 0x7f6d84820d90
        status = <value optimized out>
        astaddr = <value optimized out>
#3  0x00007f6d85434eaf in dlm_recv_thread (lsinfo=0x1baea50)
    at /usr/src/debug/cluster-3.0.12.1/dlm/libdlm/libdlm.c:1180
        lsi = 0x1baea50
#4  0x00007f6d8521d851 in start_thread (arg=0x7f6d84821700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7f6d84821700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140108351280896, 
                5333901287343311301, 140108361789952, 140108351281600, 0, 3, 
                -5413647869769365051, -5413651424327710267}, mask_was_saved = 0}}, 
          priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, 
              canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
#5  0x00007f6d84f6b67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
No locals.

Comment 25 Peter Rajnoha 2012-04-26 13:18:45 UTC

The block on "read" in the "local_pipe_callback" fn is because there's no data to read, I assume. Strange thing here is that the "local_pipe_callback" where that blocking happens, is called based on previous "select" call in the "main_loop". That's odd. Seems like some uncaught error code somewhere (select?). I'll try to investigate more...

Comment 26 Nenad Peric 2012-04-26 16:09:22 UTC

The problem is in the way clvmd gets started:

if open files limit is set to 1024 when starting clvmd, the clvmd process hits
in and gives out a few errors we have noticed in one of the previous comments:

 cluster request failed: Device or resource busy

because it actually reached an open file limit,
but completes the run.

If however, before starting the clvmd the limit of open files is increased (I
increased it to 4096, but I think it would do with any lower number, just
bigger than 1024) it gets stuck as reported in my first posts.

This gives an easy way to reproduce it. 
Hope this helps in tracking it down.

Comment 27 Peter Rajnoha 2012-04-27 13:11:32 UTC

(In reply to comment #26)
> If however, before starting the clvmd the limit of open files is increased (I
> increased it to 4096, but I think it would do with any lower number, just
> bigger than 1024) it gets stuck as reported in my first posts.
> 
> This gives an easy way to reproduce it. 
> Hope this helps in tracking it down.

Yes, reproduced now. Thanks.

Comment 31 Zdenek Kabelac 2012-10-12 11:39:49 UTC

I believe this bug has been reasonable resolved.

Comment 33 Nenad Peric 2012-10-23 10:13:31 UTC

This BZ is not fixed.

LVM commands get deadlocked again in the same way.

Here's the setup:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 29479
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 29479
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Packages installed:
lvm2-debuginfo-2.02.98-2.el6.x86_64
lvm2-libs-2.02.98-2.el6.x86_64
lvm2-devel-2.02.98-2.el6.x86_64
lvm2-2.02.98-2.el6.x86_64
lvm2-cluster-2.02.98-2.el6.x86_64
device-mapper-event-1.02.77-2.el6.x86_64
device-mapper-event-devel-1.02.77-2.el6.x86_64
device-mapper-1.02.77-2.el6.x86_64
device-mapper-event-libs-1.02.77-2.el6.x86_64
device-mapper-devel-1.02.77-2.el6.x86_64
device-mapper-libs-1.02.77-2.el6.x86_64

kernel-2.6.32-330.el6.x86_64

Executed:

for j in $(seq 1 200);
    do
       lvs vg_test&
       vgs&
    done


Output:
[1] 6579
[2] 6580
[3] 6581
[4] 6582
[5] 6583
[6] 6584
.
.
.
[399] 7191
[400] 7192
(05:14:09) [root@r6-node01:~]$   LV    VG      Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
  onelv vg_test -wi-a---- 500.00m                                           
  twolv vg_test -wi-a---- 500.00m                                           
  LV    VG      Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
  onelv vg_test -wi-ao--- 500.00m                                           
  twolv vg_test -wi-ao--- 500.00m                                           
  LV    VG      Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
  onelv vg_test -wi-ao--- 500.00m                                           
  twolv vg_test -wi-ao--- 500.00m                                           

and then it gets stuck. 
No further LVM commands can be executed.

Comment 34 Nenad Peric 2012-10-23 10:56:11 UTC

The minimum number of iterations to make it crash, as far as I have seen is 170. 
and as before ulimit for open files should be greater than 1024. 

If not, some of the threads  would give out exit code 5 and return:
" Can't get lock for VolGroup
  cluster request failed: Device or resource busy
"

Comment 35 Peter Rajnoha 2012-12-05 09:24:02 UTC

We need look at this again. But too late now, moving to 6.5.

Comment 36 Zdenek Kabelac 2012-12-05 09:33:34 UTC

The memory size used by clvmd has been reduced, the next problem here is in setting of network socket connections.

The given use-case of contacting clvmd by so many users at the same time however is not 'real-life' situation - thus it should not be really a big problem for now. (The memory consumption has been much bigger issue)

Comment 37 Peter Rajnoha 2012-12-05 12:05:06 UTC

Moving this bug back to 6.4 - we'll make it into two parts - one for 6.4 with things already resolved and one for 6.5 for things still left to resolve.

Comment 38 Zdenek Kabelac 2012-12-05 12:50:19 UTC

I've made  local tests with over 1000 lvs commands running in parallel - 
and test passed in 24 seconds on my 4GB laptop.

But - there are few things which could influence overall process -

So we would need to know the size of memory on the machine where the test is failing.   If you want to run that many commands in parallel - you have to count with quite huge 'default' size of preallocated memory for each lvm2 command.

The fact I'm able to run 1000 lvm commands on my machine is - because I use lowered boundaries for preallocated memory in lvm.conf.

So after all my assumption why is it failing is - because your box runs out of memory in limited memory environment - if you would have couple GB - you would be just fine - since in my test it looks like  clvmd itself is happy with around 60MB or real used memory - but since a lot of running lvm commands takes a lot of memory - you are getting the memory problem anyway.


We will need to consider, if we will start to watch out for available system resources - but that's out of 6.4.

Comment 39 Peter Rajnoha 2012-12-05 13:02:11 UTC

(In reply to comment #38)
> We will need to consider, if we will start to watch out for available system
> resources - but that's out of 6.4.

That can be now tracked by bug #883842.

Comment 41 Nenad Peric 2012-12-20 12:10:42 UTC

Even though it is maybe not a "normal" use case scenario, still this is what happens and should be noted:

If a system is running with open file limit (ulimit -n) greater than 1024, clvmd is not capable of handling more than 340 parallel requests at the time (meaning 170 iterations of two commands for example).

What happens is that it just gets stuck after returning from maybe 10 of them, the rest are just sitting and waiting. 

The system memory has no visible effect on this behavior (tried with 4GB and 2GB)- just the number of open files allowed. 

At the moment the commands no longer return this was memory usage:

(06:11:13) [root@r6-node01:~]$ free
             total       used       free     shared    buffers     cached
Mem:       3793056     965256    2827800          0      15840     100088
-/+ buffers/cache:     849328    2943728
Swap:      2064376          0    2064376

Comment 42 Peter Rajnoha 2012-12-20 12:22:11 UTC

We'll document the limits in the technical note.

Comment 43 Nenad Peric 2013-01-16 15:11:02 UTC

Marking this Verified with a condition of making a technical note describing this behavior (ulimit >1024) as stated in Comment #42.

Comment 44 errata-xmlrpc 2013-02-21 08:03:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html