| Summary: | Segfault in io-cache | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Harshavardhana <fharshav> | ||||
| Component: | io-cache | Assignee: | Raghavendra G <raghavendra> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | mainline | CC: | aavati, anush, cww, gluster-bugs, vijay | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | --- | |||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Initial observation showed that this->private was NULL (gdb) p *table cannot access memory 0x0 Seems like we need a NULL check. By looking at logs it seems that the fini was trigged as glusterfs got a sigterm while it was waiting on "waitpid" inside "fuse_mnt_add_mount" Can you please place the core file in /share/bugzilla/<bugid>? I can't this was reproduced on Storage Platform and the machine i am using is a laptop with no connectivity.. to internet.. I tried getting as many logs as i could. Observed a segfault while testing on "Storage Platform"
Follwoing the gdb backtrace.
(gdb)
#0 0x00000038bc4087a0 in pthread_mutex_destroy () from /lib64/libpthread.so.0
#1 0x00007f7483fae5d1 in fini (this=0x1ba6d60) at io-cache.c:1351
#2 0x0000000000402d1f in cleanup_and_exit (signum=<value optimized out>) at glusterfsd.c:950
#3 <signal handler called>
#4 0x00000038bc40ea2b in waitpid () from /lib64/libpthread.so.0
#5 0x00007f7483da6c11 in fuse_mnt_add_mount (fsname=0x1ba01e0 "/etc/glusterfs/test.vol", mnt=0x1ba81e0 "/nfs/test", type=0x7f7483da8d24 "fuse.glusterfs",
opts=0x7f7483da7df8 "allow_other,default_permissions,max_read=131072", progname=<value optimized out>) at ../../../../contrib/fuse-lib/mount.c:153
#6 0x00007f7483da733d in fuse_mount_sys (mnt_param=<value optimized out>, fsname=<value optimized out>, mountpoint=<value optimized out>) at ../../../../contrib/fuse-lib/mount.c:553
#7 gf_fuse_mount (mnt_param=<value optimized out>, fsname=<value optimized out>, mountpoint=<value optimized out>) at ../../../../contrib/fuse-lib/mount.c:582
#8 0x00007f7483d99fab in init (this_xl=0x1ba1520) at fuse-bridge.c:3391
#9 0x00000038bbc1408b in xlator_init (xl=0x1ba1520) at xlator.c:940
#10 0x00000038bbc14121 in xlator_init_rec (xl=<value optimized out>) at xlator.c:833
#11 xlator_tree_init (xl=<value optimized out>) at xlator.c:871
#12 0x00000000004033cc in _xlator_graph_init (xl=<value optimized out>) at glusterfsd.c:581
#13 glusterfs_graph_init (xl=<value optimized out>) at glusterfsd.c:631
#14 0x0000000000404038 in main (argc=<value optimized out>, argv=<value optimized out>) at glusterfsd.c:1344
(gdb) (gdb) (gdb) quit
Everyone, whats going on with this bug, is it still viable?. As i don't have much information other than the backtrace and client log file. which translator's this->private was NULL? The information contained is very less to fix the bug. It does not point to whether io-cache is the culprit. There are two things in bug one is waitpid() code is from fuse_mnt_add_sys which is waiting to mount not sure why it is waiting long enough here. Now i am not sure how it is reproducible. If i can remember "this->private" is from the fini which is called for io-cache right with SIGTERM (cleanup_and_exit). PATCH: http://patches.gluster.com/patch/2618 in master (Added null checks in "fini") PATCH: http://patches.gluster.com/patch/2619 in release-2.0 (Add null pointer checks in "fini") Verifed with 2.0.10rc1 |
Created attachment 96 [details] Patch fixing the described problems.