Bug 698393 - clvmd crashes when attempting to create thousands of LVs
clvmd crashes when attempting to create thousands of LVs
Status: CLOSED DUPLICATE of bug 730289
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.1
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Milan Broz
Corey Marthaler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-20 15:31 EDT by Corey Marthaler
Modified: 2013-02-28 23:10 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-10-11 07:19:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
coredump from taft-02 (4.63 MB, application/x-gzip)
2011-04-20 15:41 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2011-04-20 15:31:32 EDT
Description of problem:
This may just be a dup of bug 697945.

I tried this on all the nodes in the cluster

for i in $(seq 1 10); 
do 
  for j in $(seq 1 75); 
  do 
    lvcreate -n b_$j -L 12M b$i & 
  done; 
done

clvmd[7062]: segfault at 7fff8d42b550 ip 000000000041140f sp 00007fff7d909d80 error 4 in clvmd[4000]
Apr 20 14:04:57 taft-02 kernel: clvmd[7062]: segfault at 7fff8d42b550 ip 000000000041140f sp 00007f]
Apr 20 14:05:13 taft-02 abrt[9866]: saved core dump of pid 7062 (/usr/sbin/clvmd) to /var/spool/abr)
Apr 20 14:05:13 taft-02 abrtd: Directory 'ccpp-1303326297-7062' creation detected
Apr 20 14:05:14 taft-02 abrt[9866]: size of '/var/spool/abrt' >= 1250 MB, deleting 'ccpp-1303233863'
Apr 20 14:05:14 taft-02 abrtd: Size of '/var/spool/abrt' >= 1000 MB, deleting 'ccpp-1303233863-1487'
Apr 20 14:05:14 taft-02 abrtd: New crash /var/spool/abrt/ccpp-1303326297-7062, processing


Core was generated by `clvmd -T30'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000041140f in main_loop (local_sock=<value optimized out>, cmd_timeout=60) at clvmd.c:860
860                                     if (FD_ISSET(thisfd->fd, &in)) {
Missing separate debuginfos, use: debuginfo-install clusterlib-3.0.12-41.el6.x86_64 corosynclib-1.2.3-36.el6.x86_64 glibc-2.12-1.25.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libsepol-2.0.41-3.el6.x86_64 libudev-147-2.35.el6.x86_64
(gdb) bt
#0  0x000000000041140f in main_loop (local_sock=<value optimized out>, cmd_timeout=60) at clvmd.c:860
#1  0x0000000000412f61 in main (argc=<value optimized out>, argv=0x7fff7d90aa38) at clvmd.c:596


Version-Release number of selected component (if applicable):
2.6.32-131.0.1.el6.x86_64

lvm2-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-libs-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-cluster-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
udev-147-2.35.el6    BUILT: Wed Mar 30 07:32:05 CDT 2011
device-mapper-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
cmirror-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
Comment 1 Corey Marthaler 2011-04-20 15:41:22 EDT
Created attachment 493600 [details]
coredump from taft-02
Comment 2 Corey Marthaler 2011-04-20 16:34:15 EDT
This is easily reproducible. In fact I just hit it again on all four nodes in my cluster. These are the two different stacks I saw.


Core was generated by `clvmd -T30'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000041a9d2 in lvmcache_label_scan ()
Missing separate debuginfos, use: debuginfo-install lvm2-cluster-2.02.83-3.el6.x86_64
(gdb) bt
#0  0x000000000041a9d2 in lvmcache_label_scan ()
#1  0x00000000004473d3 in lv_from_lvid ()
#2  0x00000000004173c5 in lv_activation_filter ()
#3  0x0000000000414bc3 in ?? ()
#4  0x000000000041500f in do_lock_lv ()
#5  0x00000000004100c6 in do_command ()
#6  0x000000000041372b in ?? ()
#7  0x0000000000413adc in ?? ()
#8  0x00000033054077e1 in start_thread () from /lib64/libpthread.so.0
#9  0x00000033050e68ed in clone () from /lib64/libc.so.6



Program terminated with signal 11, Segmentation fault.
#0  0x000000000041a9d2 in lvmcache_label_scan (cmd=0x7f9f6c0008c0, full_scan=2) at cache/lvmcache.c:589
589             if (full_scan == 2 && !cmd->filter->use_count && !refresh_filters(cmd)) {
Missing separate debuginfos, use: debuginfo-install clusterlib-3.0.12-41.el6.x86_64 corosynclib-1.2.3-36.el6.x86_64 glibc-2.12-1.25.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libsepol-2.0.41-3.el6.x86_64 libudev-147-2.35.el6.x86_64
(gdb) bt
#0  0x000000000041a9d2 in lvmcache_label_scan (cmd=0x7f9f6c0008c0, full_scan=2) at cache/lvmcache.c:589
#1  0x00000000004473d3 in _vg_read_by_vgid (cmd=0x7f9f6c0008c0,
    lvid_s=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7", precommitted=0) at metadata/metadata.c:3223
#2  lv_from_lvid (cmd=0x7f9f6c0008c0, lvid_s=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7",
    precommitted=0) at metadata/metadata.c:3262
#3  0x00000000004173c5 in lv_activation_filter (cmd=0x7f9f6c0008c0,
    lvid_s=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7", activate_lv=0x7f9f7145ca6c)
    at activate/activate.c:1332
#4  0x0000000000414bc3 in do_activate_lv (resource=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7",
    lock_flags=132 '\204', mode=1) at lvm-functions.c:343
#5  0x000000000041500f in do_lock_lv (command=25 '\031', lock_flags=132 '\204',
    resource=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7") at lvm-functions.c:532
#6  0x00000000004100c6 in do_command (client=0x227aa30, msg=0x22cd3f0, msglen=85, buf=0x7f9f7145cdd0, buflen=1481, retlen=0x7f9f7145cddc)
    at clvmd-command.c:120
#7  0x0000000000413bd1 in process_local_command (arg=<value optimized out>) at clvmd.c:1677
#8  process_work_item (arg=<value optimized out>) at clvmd.c:1910
#9  lvm_thread_fn (arg=<value optimized out>) at clvmd.c:1959
#10 0x00000033200077e1 in start_thread () from /lib64/libpthread.so.0
#11 0x000000331fce68ed in clone () from /lib64/libc.so.6
Comment 3 Milan Broz 2011-08-11 09:02:10 EDT
I hope it is fixed by properly return error if clvmd has no more file descriptors, should be part of 2.02.87 upstream.

(I was not able to reproduce clvmd crash at least with patch but backtraces differs.)
Comment 5 Corey Marthaler 2011-09-08 18:18:18 EDT
This issue still exists in the latest rpms.

Sep  8 17:11:48 taft-04 kernel: clvmd[6236]: segfault at 10 ip 000000000041cb92 sp 00007f5714374690 error 4 in clvmd[400000+9a000]

2.6.32-193.el6.x86_64

lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
Comment 7 Corey Marthaler 2011-09-13 16:05:05 EDT
FWIW, the bt in comment #5 looks to be similar to the one in the original report:

Core was generated by `clvmd -T30'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000041cb92 in lvmcache_label_scan ()
Missing separate debuginfos, use: debuginfo-install lvm2-cluster-2.02.87-2.el6.x86_64
(gdb) bt
#0  0x000000000041cb92 in lvmcache_label_scan ()
#1  0x000000000044eea3 in lv_from_lvid ()
#2  0x0000000000418b85 in lv_activation_filter ()
#3  0x0000000000416343 in ?? ()
#4  0x000000000041679f in do_lock_lv ()
#5  0x00000000004117b7 in do_command ()
#6  0x000000000041503b in ?? ()
#7  0x00000000004153fc in ?? ()
#8  0x00000039a2c077e1 in ?? ()
#9  0x00007f5714375700 in ?? ()
#10 0x0000000000000000 in ?? ()
Comment 9 Zdenek Kabelac 2011-10-11 07:19:17 EDT
Filter refreshing was not handled well, when clvmd runs out of free file descriptors. It also addressed within memory consumption patch set.

*** This bug has been marked as a duplicate of bug 730289 ***

Note You need to log in before you can comment on or make changes to this bug.