Bug 1217589
| Summary: | glusterd crashed while schdeuler was creating snapshots when bit rot was enabled on the volumes | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | senaik | |
| Component: | glusterd | Assignee: | bugs <bugs> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | mainline | CC: | amukherj, bugs, smohan | |
| Target Milestone: | --- | Keywords: | Reopened, Triaged | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1224189 (view as bug list) | Environment: | ||
| Last Closed: | 2016-06-22 05:17:47 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1186580, 1224189 | |||
|
Description
senaik
2015-04-30 18:16:57 UTC
Proposing this bug as a blocker. The backtrace of this crash is same for #BZ 1211640, so marking it as duplicate. *** This bug has been marked as a duplicate of bug 1211640 *** Atin, Reopening as I needed some clarification. Core 19223 and 23765 are related/tracked by the bug you mentioned above and another bug (bug 1207146). However, the core on 10.70.36.4: ========== core.24094 seems to be different and the backtrace is different than the other two cores. Can you please check that and clarify? Reposting the backtrace for clarity. 10.70.36.4: ========== core.24094 Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/lib/glusterd'. Program terminated with signal 11, Segmentation fault. #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 Missing separate debuginfos, use: debuginfo-install glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64 (gdb) bt #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 #1 0x0000003efac3d7ed in gf_print_trace () from /usr/lib64/libglusterfs.so.0 #2 <signal handler called> #3 0x00007f6b6400e820 in ?? () #4 0x00007f6b7cb5cc0a in gf_changelog_reborp_rpcsvc_notify () from /usr/lib64/libgfchangelog.so.0 #5 0x0000003efb408425 in rpcsvc_handle_disconnect () from /usr/lib64/libgfrpc.so.0 #6 0x0000003efb409f60 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0 #7 0x0000003efb40b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #8 0x00007f6b7ddd86a1 in ?? () from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so #9 0x0000003efac7d060 in ?? () from /usr/lib64/libglusterfs.so.0 #10 0x00000035324079d1 in start_thread () from /lib64/libpthread.so.0 #11 0x00000035320e89dd in clone () from /lib64/libc.so.6 Hi seema, the core.24094 is same as https://bugzilla.redhat.com/show_bug.cgi?id=1207146 bug. so its a bitrot crash core. its a known issue. this core is not a glusterd crash core. so glusterd crash is solved by https://bugzilla.redhat.com/show_bug.cgi?id=1211640 bug. for glusterd crash patch for bug https://bugzilla.redhat.com/show_bug.cgi?id=1211640 have already merged. could you reproduce this bug again and let us know that what is crashing glusterd or bitrot ??? we need more information regarding this. Gaurav, As mentioned in Comment 4, bt of core.24094 and bt reported in BZ 1207146 looks different. Also I faced both glusterd and bitd crash which are tracked by BZ 1207146 and 1211640 . But core.24094 looks different from what is reported in both these 2 bugs. Request you to please analyse core.24094. Please find the sosreports below: ================================ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/snapshots/1217589/ (In reply to senaik from comment #4) > Atin, > > Reopening as I needed some clarification. > > Core 19223 and 23765 are related/tracked by the bug you mentioned above and > another bug (bug 1207146). > > However, the core on 10.70.36.4: > ========== > core.24094 > > seems to be different and the backtrace is different than the other two > cores. Can you please check that and clarify? > > Reposting the backtrace for clarity. > > 10.70.36.4: > ========== > core.24094 > > Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id > gluster/bitd -p /var/lib/glusterd'. > Program terminated with signal 11, Segmentation fault. > #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 > Missing separate debuginfos, use: debuginfo-install > glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64 > (gdb) bt > #0 0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0 > #1 0x0000003efac3d7ed in gf_print_trace () from /usr/lib64/libglusterfs.so.0 > #2 <signal handler called> > #3 0x00007f6b6400e820 in ?? () > #4 0x00007f6b7cb5cc0a in gf_changelog_reborp_rpcsvc_notify () > from /usr/lib64/libgfchangelog.so.0 > #5 0x0000003efb408425 in rpcsvc_handle_disconnect () > from /usr/lib64/libgfrpc.so.0 > #6 0x0000003efb409f60 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0 > #7 0x0000003efb40b7b8 in rpc_transport_notify () from > /usr/lib64/libgfrpc.so.0 > #8 0x00007f6b7ddd86a1 in ?? () > from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so > #9 0x0000003efac7d060 in ?? () from /usr/lib64/libglusterfs.so.0 > #10 0x00000035324079d1 in start_thread () from /lib64/libpthread.so.0 > #11 0x00000035320e89dd in clone () from /lib64/libc.so.6 Seema, I believe Gaurav has already clarified about it. Clearing the needinfo. Thanks, Atin Seema, Backtrace of #Bug 1207146 looks pretty similar to the one which you hit. 1207146 is in modified state but I am unable to find any patch against it. Could you retest it and see if you are hitting the crash? Thanks, Atin Atin, I'd like if you would post the patch details in the bug and move it ON_QA if you are sure it is fixed. I'm in the middle of another run, and might take some time before I can get back to this. Seema, Unfortunately I don't have information on the patch which has solved #BZ 1207146. Bitrot team can comment on it. |