Description of problem: ====================== Snapshot daemon crashed while trying to access snap directory under .snaps directory Version-Release number of selected component (if applicable): ============================================================ glusterfs 3.6.0.30 How reproducible: ================ 1/1 Steps to Reproduce: ================== 1.Create a 2x2 dist rep volume and start it 2.Fuse and NFS mount the volume 3. Enbale USS on the volume 4.Create some IO Fuse mount : for i in {1..10} ; do cp -rvf /etc etc.$i ; done NFS mount : for i in {1..10} ; do cp -rvf /etc nfs_etc.$i ; done 3.While IO is going on, create few snapshots on the volume for i in {1..10}; do gluster snapshot create snap"$i" vol0 ;done 4.After snapshot creation is completed, from fuse mount, cd to .snaps [root@dhcp-0-97 .snaps]# ll total 0 d---------. 0 root root 0 Jan 1 1970 snap1 d---------. 0 root root 0 Jan 1 1970 snap10 d---------. 0 root root 0 Jan 1 1970 snap2 d---------. 0 root root 0 Jan 1 1970 snap3 d---------. 0 root root 0 Jan 1 1970 snap4 d---------. 0 root root 0 Jan 1 1970 snap5 d---------. 0 root root 0 Jan 1 1970 snap6 d---------. 0 root root 0 Jan 1 1970 snap7 d---------. 0 root root 0 Jan 1 1970 snap8 d---------. 0 root root 0 Jan 1 1970 snap9 cd to snap1 and list the files and directories under them, resulted in snapd crash [root@dhcp-0-97 .snaps]# cd snap1 [root@dhcp-0-97 snap1]# ls ls: cannot read symbolic link rc4.d: Transport endpoint is not connected ls: cannot access cups: Transport endpoint is not connected ls: cannot access cron.weekly: Transport endpoint is not connected ls: cannot access quotatab: Transport endpoint is not connected ls: reading directory .: File descriptor in bad state aliases.db cron.weekly environment magic my.cnf PackageKit printcap rc4.d shells xdg cron.hourly cups gshadow- modprobe.d oddjob plymouth quotatab rc.d statetab.d yum.conf [root@dhcp-0-97 snap1]# ll ls: cannot open directory .: Transport endpoint is not connected [root@dhcp-0-97 snap1]# ls ls: cannot open directory .: Transport endpoint is not connected [root@dhcp-0-97 snap1]# ls ls: cannot open directory .: Transport endpoint is not connected [root@dhcp-0-97 snap1]# cd .. bash: cd: ..: Transport endpoint is not connected [root@dhcp-0-97 snap1]# cd .. bash: cd: ..: Transport endpoint is not connected gluster v status vol0 Status of volume: vol0 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick snapshot13.lab.eng.blr.redhat.com:/rhs/brick1/b1 49152 Y 16104 Brick snapshot14.lab.eng.blr.redhat.com:/rhs/brick1/b1 49152 Y 14350 Brick snapshot15.lab.eng.blr.redhat.com:/rhs/brick1/b1 49152 Y 14379 Brick snapshot16.lab.eng.blr.redhat.com:/rhs/brick1/b1 49152 Y 14046 Snapshot Daemon on localhost N/A N 16315 NFS Server on localhost 2049 Y 16330 Self-heal Daemon on localhost N/A Y 16257 Snapshot Daemon on snapshot14.lab.eng.blr.redhat.com 49160 Y 14527 NFS Server on snapshot14.lab.eng.blr.redhat.com 2049 Y 14542 Self-heal Daemon on snapshot14.lab.eng.blr.redhat.com N/A Y 14480 Snapshot Daemon on snapshot16.lab.eng.blr.redhat.com 49160 Y 14227 NFS Server on snapshot16.lab.eng.blr.redhat.com 2049 Y 14242 Self-heal Daemon on snapshot16.lab.eng.blr.redhat.com N/A Y 14179 Snapshot Daemon on snapshot15.lab.eng.blr.redhat.com 49160 Y 14560 NFS Server on snapshot15.lab.eng.blr.redhat.com 2049 Y 14567 Self-heal Daemon on snapshot15.lab.eng.blr.redhat.com N/A Y 14506 Task Status of Volume vol0 ------------------------------------------------------------------------------ Actual results: =============== snapd crash while trying to access the snap directory under .snaps Expected results: ================ Accessing snaps under .snaps should not result in any crash Additional info: =============== snapd log snippet: ~~~~~~~~~~~~~~~~~~ [2014-11-04 05:58:04.581582] I [snapview-server-mgmt.c:27:mgmt_cbk_snap] 0-mgmt: list of snapshots changed [2014-11-04 05:58:15.695738] W [dict.c:1307:dict_get_with_ref] (-->/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x12c) [0x396aa271dc] (-->/usr/lib64/glusterfs/3.6.0.30/xlator/features/snapview-server.so(svs_lookup+0x2e3) [0x7f47ce276f03] (-->/usr/lib64/libglusterfs.so.0(dict_get_str_boolean+0x1f) [0x396aa1aabf]))) 0-dict: dict OR key (entry-point) is NULL [2014-11-04 05:58:19.481945] W [dict.c:1307:dict_get_with_ref] (-->/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x12c) [0x396aa271dc] (-->/usr/lib64/glusterfs/3.6.0.30/xlator/features/snapview-server.so(svs_lookup+0x2e3) [0x7f47ce276f03] (-->/usr/lib64/libglusterfs.so.0(dict_get_str_boolean+0x1f) [0x396aa1aabf]))) 0-dict: dict OR key (entry-point) is NULL pending frames: frame : type(0) op(2) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2014-11-04 05:58:19 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.6.0.30 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x396aa1ff06] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x396aa3a59f] /lib64/libc.so.6[0x343c8326a0] /usr/lib64/libglusterfs.so.0(default_readlink+0x32)[0x396aa25a52] /usr/lib64/libglusterfs.so.0(default_readlink_resume+0x137)[0x396aa293f7] /usr/lib64/libglusterfs.so.0(call_resume+0x54e)[0x396aa41cde] /usr/lib64/glusterfs/3.6.0.30/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f47ce069348] /lib64/libpthread.so.0[0x343cc079d1] /lib64/libc.so.6(clone+0x6d)[0x343c8e89dd]
Retried the steps again, able to hit the crash consistently.
bt of the core : ============== oaded symbols for /usr/lib64/glusterfs/3.6.0.30/xlator/meta.so Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libnss_dns.so.2 Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libgcc_s.so.1 Core was generated by `/usr/sbin/glusterfsd -s localhost --volfile-id snapd/vol1 -p /var/lib/glusterd/'. Program terminated with signal 11, Segmentation fault. #0 0x000000396aa25a52 in default_readlink (frame=0x7f215bafb0f8, this=0x1768ca0, loc=0x7f215b583f38, size=4096, xdata=0x0) at defaults.c:1921 1921 STACK_WIND_TAIL (frame, FIRST_CHILD(this), Missing separate debuginfos, use: debuginfo-install glusterfs-3.6.0.30-1.el6rhs.x86_64 (gdb) bt #0 0x000000396aa25a52 in default_readlink (frame=0x7f215bafb0f8, this=0x1768ca0, loc=0x7f215b583f38, size=4096, xdata=0x0) at defaults.c:1921 #1 0x000000396aa293f7 in default_readlink_resume (frame=0x7f215bafb2fc, this=0x176cc00, loc=0x7f215b583f38, size=4096, xdata=0x0) at defaults.c:1491 #2 0x000000396aa41cde in call_resume_wind (stub=0x7f215b583ef8) at call-stub.c:2322 #3 call_resume (stub=0x7f215b583ef8) at call-stub.c:2841 #4 0x00007f214e3c8348 in iot_worker (data=0x1783730) at io-threads.c:214 #5 0x000000343cc079d1 in start_thread () from /lib64/libpthread.so.0 #6 0x000000343c8e89dd in clone () from /lib64/libc.so.6
Fixed with https://code.engineering.redhat.com/gerrit/36755
Version :glusterfs-3.6.0.33-1 Retried the steps as mentioned in the Description, did not face the crash. Marking the bug as 'Verified'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0038.html