Description of problem: ====================== Enable bit rot on the volumes and scheduled snapshots to be created every 5 mins on the volume - first snapshot creation failed as glusterd crashed Version-Release number of selected component (if applicable): ============================================================= gluster --version glusterfs 3.7.0alpha0 built on Apr 28 2015 01:55:23 How reproducible: ================= 1/1 Steps to Reproduce: =================== 1.Create 3 volumes - 8x2 vol with replica 2 hot tier attached , 12 brick disperse volume with redundacny 4 , 3 brick distribute vol 2.Enable USS, quota and bit rot on all volumes 3.Fuse and NFS mount all volumes 4.Initialise scheduler on all nodes and enable it 5. Add 3 jobs to create snapshots on 3 volumes every 5 mins [root@rhs-arch-srv4 ~]# snap_scheduler.py list JOB_NAME SCHEDULE OPERATION VOLUME NAME -------------------------------------------------------------------- J1_vol0 */5 * * * * Snapshot Create vol0 J1_vol1 */5 * * * * Snapshot Create vol1 J1_vol2 */5 * * * * Snapshot Create vol2 6.First snapshot create failed 04-30 23:30:01,156 gcron.py:67 takeSnap] DEBUG Running command 'gluster snapshot create Scheduled-J1_vol1-vol1 vol1' [2015-04-30 23:30:01,162 gcron.py:95 doJob] DEBUG /var/run/gluster/shared_storage/snaps/lock_files/J1_vol2 last modified at Thu Apr 30 23:25:08 2015 [2015-04-30 23:30:01,162 gcron.py:97 doJob] DEBUG Processing job Scheduled-J1_vol2-vol2 [2015-04-30 23:30:01,163 gcron.py:67 takeSnap] DEBUG Running command 'gluster snapshot create Scheduled-J1_vol2-vol2 vol2' [2015-04-30 23:30:06,827 gcron.py:74 takeSnap] DEBUG Command 'gluster snapshot create Scheduled-J1_vol1-vol1 vol1' returned '1' [2015-04-30 23:30:06,830 gcron.py:74 takeSnap] DEBUG Command 'gluster snapshot create Scheduled-J1_vol2-vol2 vol2' returned '1' [2015-04-30 23:30:06,832 gcron.py:74 takeSnap] DEBUG Command 'gluster snapshot create Scheduled-J1_vol0-vol0 vol0' returned '1' [2015-04-30 23:30:06,830 gcron.py:77 takeSnap] ERROR Snapshot of vol2 failed [2015-04-30 23:30:06,828 gcron.py:77 takeSnap] ERROR Snapshot of vol1 failed [2015-04-30 23:30:06,833 gcron.py:77 takeSnap] ERROR Snapshot of vol0 failed [2015-04-30 23:30:06,838 gcron.py:78 takeSnap] ERROR Command output: [2015-04-30 23:30:06,838 gcron.py:78 takeSnap] ERROR Command output: [2015-04-30 23:30:06,838 gcron.py:78 takeSnap] ERROR Command output: [2015-04-30 23:30:06,839 gcron.py:79 takeSnap] ERROR snapshot create: failed: quorum is not met [2015-04-30 23:30:06,839 gcron.py:79 takeSnap] ERROR snapshot create: failed: One or more bricks may be down. [2015-04-30 23:30:06,839 gcron.py:79 takeSnap] ERROR snapshot create: failed: quorum is not met [2015-04-30 23:30:06,839 gcron.py:101 doJob] ERROR Job Scheduled-J1_vol2-vol2 failed [2015-04-30 23:30:06,839 gcron.py:101 doJob] ERROR Job Scheduled-J1_vol1-vol1 failed [2015-04-30 23:30:06,839 gcron.py:101 doJob] ERROR Job Scheduled-J1_vol0-vol0 failed ~ patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-04-30 17:45:22 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.0alpha0 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x32d3621dc6] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x32d363dadf] /lib64/libc.so.6[0x33f6c326a0] /usr/lib64/liburcu-bp.so.1(rcu_read_unlock_bp+0x16)[0x7f07aa97ad16] /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_commit+0x1c2)[0x7f07aac78392] /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_initiate_snap_phases+0x748)[0x7f07aac7c1b8] /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_handle_snapshot_create+0x4c0)[0x7f07aac67b50] /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_handle_snapshot_fn+0x821)[0x7f07aac73901] /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f07aabbee5f] /usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x32d3661d12] /lib64/libc.so.6[0x33f6c438f0] 10.70.34.50: =========== core.11333 Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/lib/glusterd'. Program terminated with signal 7, Bus error. #0 0x00007fbce0242c54 in gf_changelog_reborp_rpcsvc_notify () from /usr/lib64/libgfchangelog.so.0 Missing separate debuginfos, use: debuginfo-install glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64 (gdb) bt #0 0x00007fbce0242c54 in gf_changelog_reborp_rpcsvc_notify () from /usr/lib64/libgfchangelog.so.0 #1 0x00000032d3a09e64 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0 #2 0x00000032d3a0b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #3 0x00007fbce14bb632 in ?? () from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so #4 0x00000032d367d060 in ?? () from /usr/lib64/libglusterfs.so.0 #5 0x00000033f70079d1 in start_thread () from /lib64/libpthread.so.0 #6 0x00000033f6ce89dd in clone () from /lib64/libc.so.6 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ core.23765 - tracked by BZ 1211640 (gdb) bt #0 0x00007f07aa97ad16 in rcu_read_unlock_bp () from /usr/lib64/liburcu-bp.so.1 #1 0x00007f07aac78392 in glusterd_mgmt_v3_commit () from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so #2 0x00007f07aac7c1b8 in glusterd_mgmt_v3_initiate_snap_phases () from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so #3 0x00007f07aac67b50 in glusterd_handle_snapshot_create () from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so #4 0x00007f07aac73901 in glusterd_handle_snapshot_fn () from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so #5 0x00007f07aabbee5f in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so #6 0x00000032d3661d12 in synctask_wrap () from /usr/lib64/libglusterfs.so.0 #7 0x00000033f6c438f0 in ?? () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () 10.70.36.2 : =========== core.19223 Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/lib/glusterd'. Program terminated with signal 7, Bus error. #0 0x00007f87d54d3c54 in gf_changelog_reborp_rpcsvc_notify () from /usr/lib64/libgfchangelog.so.0 Missing separate debuginfos, use: debuginfo-install glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64 (gdb) bt #0 0x00007f87d54d3c54 in gf_changelog_reborp_rpcsvc_notify () from /usr/lib64/libgfchangelog.so.0 #1 0x0000003588e09e64 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0 #2 0x0000003588e0b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #3 0x00007f87d674c632 in ?? () from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so #4 0x0000003588a7d060 in ?? () from /usr/lib64/libglusterfs.so.0 #5 0x0000003a968079d1 in start_thread () from /lib64/libpthread.so.0 #6 0x0000003a964e89dd in clone () from /lib64/libc.so 10.70.36.4: ========== core.24094 Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/lib/glusterd'. Program terminated with signal 11, Segmentation fault. #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 Missing separate debuginfos, use: debuginfo-install glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64 (gdb) bt #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 #1 0x0000003efac3d7ed in gf_print_trace () from /usr/lib64/libglusterfs.so.0 #2 <signal handler called> #3 0x00007f6b6400e820 in ?? () #4 0x00007f6b7cb5cc0a in gf_changelog_reborp_rpcsvc_notify () from /usr/lib64/libgfchangelog.so.0 #5 0x0000003efb408425 in rpcsvc_handle_disconnect () from /usr/lib64/libgfrpc.so.0 #6 0x0000003efb409f60 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0 #7 0x0000003efb40b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #8 0x00007f6b7ddd86a1 in ?? () from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so #9 0x0000003efac7d060 in ?? () from /usr/lib64/libglusterfs.so.0 #10 0x00000035324079d1 in start_thread () from /lib64/libpthread.so.0 #11 0x00000035320e89dd in clone () from /lib64/libc.so.6 Actual results: Expected results: Additional info:
Proposing this bug as a blocker.
The backtrace of this crash is same for #BZ 1211640, so marking it as duplicate. *** This bug has been marked as a duplicate of bug 1211640 ***
Atin, Reopening as I needed some clarification. Core 19223 and 23765 are related/tracked by the bug you mentioned above and another bug (bug 1207146). However, the core on 10.70.36.4: ========== core.24094 seems to be different and the backtrace is different than the other two cores. Can you please check that and clarify? Reposting the backtrace for clarity. 10.70.36.4: ========== core.24094 Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/lib/glusterd'. Program terminated with signal 11, Segmentation fault. #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 Missing separate debuginfos, use: debuginfo-install glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64 (gdb) bt #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 #1 0x0000003efac3d7ed in gf_print_trace () from /usr/lib64/libglusterfs.so.0 #2 <signal handler called> #3 0x00007f6b6400e820 in ?? () #4 0x00007f6b7cb5cc0a in gf_changelog_reborp_rpcsvc_notify () from /usr/lib64/libgfchangelog.so.0 #5 0x0000003efb408425 in rpcsvc_handle_disconnect () from /usr/lib64/libgfrpc.so.0 #6 0x0000003efb409f60 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0 #7 0x0000003efb40b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #8 0x00007f6b7ddd86a1 in ?? () from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so #9 0x0000003efac7d060 in ?? () from /usr/lib64/libglusterfs.so.0 #10 0x00000035324079d1 in start_thread () from /lib64/libpthread.so.0 #11 0x00000035320e89dd in clone () from /lib64/libc.so.6
Hi seema, the core.24094 is same as https://bugzilla.redhat.com/show_bug.cgi?id=1207146 bug. so its a bitrot crash core. its a known issue. this core is not a glusterd crash core. so glusterd crash is solved by https://bugzilla.redhat.com/show_bug.cgi?id=1211640 bug. for glusterd crash patch for bug https://bugzilla.redhat.com/show_bug.cgi?id=1211640 have already merged. could you reproduce this bug again and let us know that what is crashing glusterd or bitrot ??? we need more information regarding this.
Gaurav, As mentioned in Comment 4, bt of core.24094 and bt reported in BZ 1207146 looks different. Also I faced both glusterd and bitd crash which are tracked by BZ 1207146 and 1211640 . But core.24094 looks different from what is reported in both these 2 bugs. Request you to please analyse core.24094. Please find the sosreports below: ================================ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/snapshots/1217589/
(In reply to senaik from comment #4) > Atin, > > Reopening as I needed some clarification. > > Core 19223 and 23765 are related/tracked by the bug you mentioned above and > another bug (bug 1207146). > > However, the core on 10.70.36.4: > ========== > core.24094 > > seems to be different and the backtrace is different than the other two > cores. Can you please check that and clarify? > > Reposting the backtrace for clarity. > > 10.70.36.4: > ========== > core.24094 > > Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id > gluster/bitd -p /var/lib/glusterd'. > Program terminated with signal 11, Segmentation fault. > #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 > Missing separate debuginfos, use: debuginfo-install > glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64 > (gdb) bt > #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 > #1 0x0000003efac3d7ed in gf_print_trace () from /usr/lib64/libglusterfs.so.0 > #2 <signal handler called> > #3 0x00007f6b6400e820 in ?? () > #4 0x00007f6b7cb5cc0a in gf_changelog_reborp_rpcsvc_notify () > from /usr/lib64/libgfchangelog.so.0 > #5 0x0000003efb408425 in rpcsvc_handle_disconnect () > from /usr/lib64/libgfrpc.so.0 > #6 0x0000003efb409f60 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0 > #7 0x0000003efb40b7b8 in rpc_transport_notify () from > /usr/lib64/libgfrpc.so.0 > #8 0x00007f6b7ddd86a1 in ?? () > from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so > #9 0x0000003efac7d060 in ?? () from /usr/lib64/libglusterfs.so.0 > #10 0x00000035324079d1 in start_thread () from /lib64/libpthread.so.0 > #11 0x00000035320e89dd in clone () from /lib64/libc.so.6 Seema, I believe Gaurav has already clarified about it. Clearing the needinfo. Thanks, Atin
Seema, Backtrace of #Bug 1207146 looks pretty similar to the one which you hit. 1207146 is in modified state but I am unable to find any patch against it. Could you retest it and see if you are hitting the crash? Thanks, Atin
Atin, I'd like if you would post the patch details in the bug and move it ON_QA if you are sure it is fixed. I'm in the middle of another run, and might take some time before I can get back to this.
Seema, Unfortunately I don't have information on the patch which has solved #BZ 1207146. Bitrot team can comment on it.