Bug 1293594
Summary: | Segmentation fault in '_Unwind_Backtrace ()' | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Soumya Koduri <skoduri> | ||||
Component: | glusterfs | Assignee: | Yaniv Kaul <ykaul> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Sweta Anandpara <sanandpa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.0 | CC: | fweimer, jakub, law, mnewsome, mpolacek, nbalacha, skoduri | ||||
Target Milestone: | pre-dev-freeze | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-10-24 11:56:07 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1413146 | ||||||
Attachments: |
|
Description
Soumya Koduri
2015-12-22 10:16:28 UTC
Created attachment 1108606 [details]
build-install-core
Attached the core and libraries installed. To view the core, execute the following command - 'gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.10962' ./build/install/sbin/glusterfs' Thanks! Not related to gcc-libraries. I think installing missing debuginfos to see a more detailed backtrace would be a start. Thanks Marek. We had run into the issue while running the tests on slave machines using jenkins. Unfortunately the machine is no longer in that state. I shall try to reproduce it with debuginfos installed and get back. I could reproduce the issue. Please find the backtrace below - Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/'. Program terminated with signal 11, Segmentation fault. #0 x86_64_fallback_frame_state (context=0x7f89be09db90, fs=0x7f89be09da10) at ../../../gcc/config/i386/linux-unwind.h:47 47 if (*(unsigned char *)(pc+0) == 0x48 Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.7.1-11.el6rhs.x86_64 (gdb) bt #0 x86_64_fallback_frame_state (context=0x7f89be09db90, fs=0x7f89be09da10) at ../../../gcc/config/i386/linux-unwind.h:47 #1 uw_frame_state_for (context=0x7f89be09db90, fs=0x7f89be09da10) at ../../../gcc/unwind-dw2.c:1210 #2 0x00007f89c58c4119 in _Unwind_Backtrace (trace=0x7f89d1a2b7d0 <backtrace_helper>, trace_argument=0x7f89be09dcd0) at ../../../gcc/unwind.inc:290 #3 0x00007f89d1a2b966 in backtrace () from /lib64/libc.so.6 #4 0x00007f89d2fc08e6 in _gf_msg_backtrace_nomem () from /usr/lib64/libglusterfs.so.0 #5 0x00007f89d2fe04af in gf_print_trace () from /usr/lib64/libglusterfs.so.0 #6 <signal handler called> #7 0x00007f89c4852aa0 in ?? () #8 0x00007f89d20aba51 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f89d1a1596d in clone () from /lib64/libc.so.6 (gdb) f 1 #1 uw_frame_state_for (context=0x7f89be09db90, fs=0x7f89be09da10) at ../../../gcc/unwind-dw2.c:1210 1210 return MD_FALLBACK_FRAME_STATE_FOR (context, fs); (gdb) f 0 #0 x86_64_fallback_frame_state (context=0x7f89be09db90, fs=0x7f89be09da10) at ../../../gcc/config/i386/linux-unwind.h:47 47 if (*(unsigned char *)(pc+0) == 0x48 (gdb) l 42 unsigned char *pc = context->ra; 43 struct sigcontext *sc; 44 long new_cfa; 45 46 /* movq __NR_rt_sigreturn, %rax ; syscall */ 47 if (*(unsigned char *)(pc+0) == 0x48 48 && *(unsigned long *)(pc+1) == 0x050f0000000fc0c7) 49 { 50 struct ucontext *uc_ = context->cfa; 51 /* The void * cast is necessary to avoid an aliasing warning. (gdb) l 52 The aliasing warning is correct, but should not be a problem 53 because it does not alias anything. */ 54 sc = (struct sigcontext *) (void *) &uc_->uc_mcontext; 55 } 56 else 57 return _URC_END_OF_STACK; 58 59 new_cfa = sc->rsp; 60 fs->regs.cfa_how = CFA_REG_OFFSET; 61 /* Register 7 is rsp */ (gdb) It looks like the contents of the *context structure are bogus. (gdb) p/xx *context $4 = {reg = {0x7f7ffea29ad0, 0x7f7ffea29ac8, 0x7f7ffea29ad8, 0x7f7ffea29ac0, 0x7f7ffea29ab0, 0x7f7ffea29aa8, 0x7f7ffea29ab8, 0x7f7ffea29ae0, 0x7f7ffea29a68, 0x7f7ffea29a70, 0x7f7ffea29a78, 0x7f7ffea29a80, 0x7f7ffea29a88, 0x7f7ffea29a90, 0x7f7ffea29a98, 0x7f7ffea29aa0, 0x7f7ffea29ae8, 0x0}, cfa = 0x7f7ffea29eb8, ra = 0x7f7fff1aa561, lsda = 0x0, bases = {tbase = 0x0, dbase = 0x0, func = 0x7f800fa7a69f}, flags = 0xc000000000000000, version = 0x0, args_size = 0x0, by_value = {0x0 <repeats 18 times>}} (gdb) p/x context->ra $5 = 0x7f7fff1aa561 But there's nothing mapped at that address: (gdb) x/x $5 0x7f7fff1aa561: Cannot access memory at address 0x7f7fff1aa561 In the caller we have: fde = _Unwind_Find_FDE (context->ra + _Unwind_IsSignalFrame (context) - 1, 1203 &context->bases); 1204 if (fde == NULL) (gdb) 1205 { 1206 #ifdef MD_FALLBACK_FRAME_STATE_FOR 1207 /* Couldn't find frame unwind info for this function. Try a 1208 target-specific fallback mechanism. This will necessarily 1209 not provide a personality routine or LSDA. */ 1210 return MD_FALLBACK_FRAME_STATE_FOR (context, fs); 1211 #else 1212 return _URC_END_OF_STACK; 1213 #endif 1214 } FDE is NULL, essentially saying we couldn't find frame unwind information for hte given context->ra address. So it's already suspect. Then x86_64_fallback_frame_state_for does: 42 unsigned char *pc = context->ra; 43 struct sigcontext *sc; 44 long new_cfa; 45 46 /* movq __NR_rt_sigreturn, %rax ; syscall */ 47 if (*(unsigned char *)(pc+0) == 0x48 48 && *(unsigned long *)(pc+1) == 0x050f0000000fc0c7) Which is just dumb. We have no idea why we didn't find the FDE in the caller and no guarantee that *pc is a valid memory location. https://gcc.gnu.org/ml/gcc-patches/2008-01/msg01474.html Touches on this issue. At some level I suspect we've got something bogus in the frame chains. But x86-64_fallback_frame_state_for simply can't do what it's trying to do without being more careful. While mincore or some other syscall with EFAULT test if the memory is readable could avoid the crashes in some cases, generally if other threads are doing bogus things like unmapping memory, there would be always a window where the memory can be unmapped. I think it is more important what code has wrong unwind info that lead to this., or if the program is just unmapping memory that is still in use. Perhaps better would be to determine in configure (or configure option?) that would tell libgcc that it just shouldn't define MD_FALLBACK_FRAME_STATE_FOR or MD_FROB_UPDATE_CONTEXT. E.g. on x86_64-linux, I think one needs glibc >= 2006-11-29, and e.g. on i386-linux similar glibc and >= 2006-03-31 kernel. In particular, for dropping MD_FALLBACK_FRAME_STATE_FOR we'd need to be sure that glibc and/or kernel, whenever they define __restore_rt or similar sequences in libc or vDSO, they contain unwind info for it, and to drop MD_FROB_UPDATE_CONTEXT additionally that the unwind info for it uses "zRS" CIE flags there. Is there a way I could try to reproduce this in RHEL 7? How do I run the bug-1140162-file-snapshot-features-encrypt-opts-validation.t test? (In reply to Soumya Koduri from comment #0) > #5 0x00000000004098d6 in glusterfsd_print_trace (signum=11) > at > /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/ > src/glusterfsd.c:2033 > #6 <signal handler called> > #7 0x00007f7fff1aa561 in ?? () > #8 0x00007f80101c6a51 in start_thread () from ./lib64/libpthread.so.0 > #9 0x00007f800fb3093d in clone () from ./lib64/libc.so.6 > (gdb) I think we should consider the root cause the crash *before* the crash handler is run. I assume that the signal delivered here is SIGSEGV as well. The address 0x00007f7fff1aa561 looks very much like a shared object address (without randomization), so the stack should be completely valid and parsed correctly by GDB. It is likely that the code segment has been unmapped by a concurrent dlclose, while some thread was still running that very code. If we could magically fix the backtracer, there would still be crash here. You need to find that concurrent dlclose and fix that. To very this theory, you should run “info files” after initialization, but before termination, and keep a note of the shared objects listed there. The address that subsequently faults should be in one of the DSOs that is subject to dlclose. Regarding the crash handler itself: Nowadays, it is generally best to avoid custom crash handlers and let ABRT/systemd-coredumpd do the job of capturing debugging information. Writing good crash handlers is very hard, and these crash handlers tend to destroy useful information. (I realize that this bug was filed several years ago.) It seems to me like this really needs to be reassigned back to the gluster team for further analysis on their end. I'm really ashamed to ask for a bug that was reported 4 years ago, but before I close this, is this somehow reproducible/relevant? No. We haven't run into this issue lately with newer builds. Thanks everyone for looking into this. Will close the bug. |