Hide Forgot
Description of problem: This may just be a dup of bug 697945. I tried this on all the nodes in the cluster for i in $(seq 1 10); do for j in $(seq 1 75); do lvcreate -n b_$j -L 12M b$i & done; done clvmd[7062]: segfault at 7fff8d42b550 ip 000000000041140f sp 00007fff7d909d80 error 4 in clvmd[4000] Apr 20 14:04:57 taft-02 kernel: clvmd[7062]: segfault at 7fff8d42b550 ip 000000000041140f sp 00007f] Apr 20 14:05:13 taft-02 abrt[9866]: saved core dump of pid 7062 (/usr/sbin/clvmd) to /var/spool/abr) Apr 20 14:05:13 taft-02 abrtd: Directory 'ccpp-1303326297-7062' creation detected Apr 20 14:05:14 taft-02 abrt[9866]: size of '/var/spool/abrt' >= 1250 MB, deleting 'ccpp-1303233863' Apr 20 14:05:14 taft-02 abrtd: Size of '/var/spool/abrt' >= 1000 MB, deleting 'ccpp-1303233863-1487' Apr 20 14:05:14 taft-02 abrtd: New crash /var/spool/abrt/ccpp-1303326297-7062, processing Core was generated by `clvmd -T30'. Program terminated with signal 11, Segmentation fault. #0 0x000000000041140f in main_loop (local_sock=<value optimized out>, cmd_timeout=60) at clvmd.c:860 860 if (FD_ISSET(thisfd->fd, &in)) { Missing separate debuginfos, use: debuginfo-install clusterlib-3.0.12-41.el6.x86_64 corosynclib-1.2.3-36.el6.x86_64 glibc-2.12-1.25.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libsepol-2.0.41-3.el6.x86_64 libudev-147-2.35.el6.x86_64 (gdb) bt #0 0x000000000041140f in main_loop (local_sock=<value optimized out>, cmd_timeout=60) at clvmd.c:860 #1 0x0000000000412f61 in main (argc=<value optimized out>, argv=0x7fff7d90aa38) at clvmd.c:596 Version-Release number of selected component (if applicable): 2.6.32-131.0.1.el6.x86_64 lvm2-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 lvm2-libs-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 lvm2-cluster-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 udev-147-2.35.el6 BUILT: Wed Mar 30 07:32:05 CDT 2011 device-mapper-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-libs-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-event-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-event-libs-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 cmirror-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011
Created attachment 493600 [details] coredump from taft-02
This is easily reproducible. In fact I just hit it again on all four nodes in my cluster. These are the two different stacks I saw. Core was generated by `clvmd -T30'. Program terminated with signal 11, Segmentation fault. #0 0x000000000041a9d2 in lvmcache_label_scan () Missing separate debuginfos, use: debuginfo-install lvm2-cluster-2.02.83-3.el6.x86_64 (gdb) bt #0 0x000000000041a9d2 in lvmcache_label_scan () #1 0x00000000004473d3 in lv_from_lvid () #2 0x00000000004173c5 in lv_activation_filter () #3 0x0000000000414bc3 in ?? () #4 0x000000000041500f in do_lock_lv () #5 0x00000000004100c6 in do_command () #6 0x000000000041372b in ?? () #7 0x0000000000413adc in ?? () #8 0x00000033054077e1 in start_thread () from /lib64/libpthread.so.0 #9 0x00000033050e68ed in clone () from /lib64/libc.so.6 Program terminated with signal 11, Segmentation fault. #0 0x000000000041a9d2 in lvmcache_label_scan (cmd=0x7f9f6c0008c0, full_scan=2) at cache/lvmcache.c:589 589 if (full_scan == 2 && !cmd->filter->use_count && !refresh_filters(cmd)) { Missing separate debuginfos, use: debuginfo-install clusterlib-3.0.12-41.el6.x86_64 corosynclib-1.2.3-36.el6.x86_64 glibc-2.12-1.25.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libsepol-2.0.41-3.el6.x86_64 libudev-147-2.35.el6.x86_64 (gdb) bt #0 0x000000000041a9d2 in lvmcache_label_scan (cmd=0x7f9f6c0008c0, full_scan=2) at cache/lvmcache.c:589 #1 0x00000000004473d3 in _vg_read_by_vgid (cmd=0x7f9f6c0008c0, lvid_s=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7", precommitted=0) at metadata/metadata.c:3223 #2 lv_from_lvid (cmd=0x7f9f6c0008c0, lvid_s=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7", precommitted=0) at metadata/metadata.c:3262 #3 0x00000000004173c5 in lv_activation_filter (cmd=0x7f9f6c0008c0, lvid_s=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7", activate_lv=0x7f9f7145ca6c) at activate/activate.c:1332 #4 0x0000000000414bc3 in do_activate_lv (resource=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7", lock_flags=132 '\204', mode=1) at lvm-functions.c:343 #5 0x000000000041500f in do_lock_lv (command=25 '\031', lock_flags=132 '\204', resource=0x22cd403 "bIn4Wt2DKNoo9nlg5g9ro4vqgvjBgK5i5BnRbu7QvgHWV5ty69jeLPa4SWXt31o7") at lvm-functions.c:532 #6 0x00000000004100c6 in do_command (client=0x227aa30, msg=0x22cd3f0, msglen=85, buf=0x7f9f7145cdd0, buflen=1481, retlen=0x7f9f7145cddc) at clvmd-command.c:120 #7 0x0000000000413bd1 in process_local_command (arg=<value optimized out>) at clvmd.c:1677 #8 process_work_item (arg=<value optimized out>) at clvmd.c:1910 #9 lvm_thread_fn (arg=<value optimized out>) at clvmd.c:1959 #10 0x00000033200077e1 in start_thread () from /lib64/libpthread.so.0 #11 0x000000331fce68ed in clone () from /lib64/libc.so.6
I hope it is fixed by properly return error if clvmd has no more file descriptors, should be part of 2.02.87 upstream. (I was not able to reproduce clvmd crash at least with patch but backtraces differs.)
This issue still exists in the latest rpms. Sep 8 17:11:48 taft-04 kernel: clvmd[6236]: segfault at 10 ip 000000000041cb92 sp 00007f5714374690 error 4 in clvmd[400000+9a000] 2.6.32-193.el6.x86_64 lvm2-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-libs-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-cluster-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 udev-147-2.37.el6 BUILT: Wed Aug 10 07:48:15 CDT 2011 device-mapper-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 cmirror-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
FWIW, the bt in comment #5 looks to be similar to the one in the original report: Core was generated by `clvmd -T30'. Program terminated with signal 11, Segmentation fault. #0 0x000000000041cb92 in lvmcache_label_scan () Missing separate debuginfos, use: debuginfo-install lvm2-cluster-2.02.87-2.el6.x86_64 (gdb) bt #0 0x000000000041cb92 in lvmcache_label_scan () #1 0x000000000044eea3 in lv_from_lvid () #2 0x0000000000418b85 in lv_activation_filter () #3 0x0000000000416343 in ?? () #4 0x000000000041679f in do_lock_lv () #5 0x00000000004117b7 in do_command () #6 0x000000000041503b in ?? () #7 0x00000000004153fc in ?? () #8 0x00000039a2c077e1 in ?? () #9 0x00007f5714375700 in ?? () #10 0x0000000000000000 in ?? ()
Filter refreshing was not handled well, when clvmd runs out of free file descriptors. It also addressed within memory consumption patch set. *** This bug has been marked as a duplicate of bug 730289 ***