Bug 795741
Summary: | Crash when tried to self heal from gluster cli: "gluster volume heal <volume_name>" | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Shwetha Panduranga <shwetha.h.panduranga> |
Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | mainline | CC: | gluster-bugs, rabhat |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-07-24 17:26:54 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 817967 |
Description
Shwetha Panduranga
2012-02-21 12:38:16 UTC
Ran glustershd with valgrind and reproduced the crash. This is the backtrace of the core. Core was generated by `'. Program terminated with signal 11, Segmentation fault. #0 0x000000000570a4bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 82 ../sysdeps/unix/syscall-template.S: No such file or directory. in ../sysdeps/unix/syscall-template.S (gdb) bt #0 0x000000000570a4bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000004e61297 in gf_timer_proc (ctx=0x5cad040) at ../../../libglusterfs/src/timer.c:182 #2 0x0000000005701d8c in start_thread (arg=0xb0d8700) at pthread_create.c:304 #3 0x00000000059ff04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () (gdb) info thr 5 Thread 13393 0x00000000059ff6a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82 4 Thread 13394 do_sigwait (set=<value optimized out>, sig=0x87f5eb8) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65 3 Thread 13395 0x00000000057078f7 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0 2 Thread 13396 0x000000000570ab3b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 * 1 Thread 13491 0x000000000570a4bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 (gdb) t 2 [Switching to thread 2 (Thread 13396)]#0 0x000000000570ab3b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 42 ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c (gdb) bt #0 0x000000000570ab3b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #1 0x0000000004e5d6dc in gf_print_trace (signum=11) at ../../../libglusterfs/src/common-utils.c:437 #2 <signal handler called> #3 0x000000000b377073 in afr_dir_exclusive_crawl (data=0x7f886d0) at ../../../../../xlators/cluster/afr/src/afr-self-heald.c:978 #4 0x0000000004e8d8da in synctask_wrap (old_task=0x7f88890) at ../../../libglusterfs/src/syncop.c:144 #5 0x000000000595e1a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x0000000000000000 in ?? () (gdb) f 3 #3 0x000000000b377073 in afr_dir_exclusive_crawl (data=0x7f886d0) at ../../../../../xlators/cluster/afr/src/afr-self-heald.c:978 978 if (shd->inprogress[child]) { (gdb) p shd $1 = (afr_self_heald_t *) 0x6064a08 (gdb) p *shd $2 = {enabled = _gf_true, pending = 0x0, inprogress = 0x0, pos = 0x0, sh_times = 0x0, timer = 0x0, healed = 0x0, heal_failed = 0x0, split_brain = 0x0} (gdb) This is what valgrind log says. For counts of detected and suppressed errors, rerun with: -v ==13383== ERROR SUMMARY: 22 errors from 22 contexts (suppressed: 4 from 4) ==13393== Warning: client switching stacks? SP change: 0x8ff6e48 --> 0xece0098 ==13393== to suppress, use: --max-stackframe=97423952 or greater ==13393== Thread 3: ==13393== Syscall param time(t) points to unaddressable byte(s) ==13393== at 0x3804049A: vgPlain_amd64_linux_REDIR_FOR_vtime (m_trampoline.S:167) ==13393== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==13393== ==13393== Warning: client switching stacks? SP change: 0x97f7e48 --> 0xf85c028 ==13393== to suppress, use: --max-stackframe=101073376 or greater ==13393== Thread 4: ==13393== Invalid read of size 4 ==13393== at 0xB377073: afr_dir_exclusive_crawl (afr-self-heald.c:978) ==13393== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==13393== ==13393== Warning: client switching stacks? SP change: 0x8ff6e48 --> 0xfc5c028 ==13393== to suppress, use: --max-stackframe=113660384 or greater ==13393== further instances of this message will not be shown. Thanks for the steps guys. Afr xl needs to maintain inode-table inside the xl if it is in self-heal-daemon. The code was depending on the option self-heal-daemon to do this. This is wrong as the option can be reconfigured to on/off. Added a new option which can't be reconfigured for this purpose. CHANGE: http://review.gluster.com/2787 (cluster/afr: Add new option to know which process it is in) merged in master by Vijay Bellur (vijay) Bug is fixed . verified on 3.3.0qa39 |