1160138 – [USS] : Snapd crashed while trying to access the snapshots under .snaps directory

Bug 1160138 - [USS] : Snapd crashed while trying to access the snapshots under .snaps directory

Summary: [USS] : Snapd crashed while trying to access the snapshots under .snaps direc...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.0.3
Assignee:	Avra Sengupta
QA Contact:	senaik
Docs Contact:
URL:
Whiteboard:	USS
Depends On:
Blocks:	1162462 1162694
TreeView+	depends on / blocked

Reported:	2014-11-04 06:28 UTC by senaik
Modified:	2016-09-17 12:59 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.6.0.33-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1162462 (view as bug list)
Environment:
Last Closed:	2015-01-15 13:41:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0038	0	normal	SHIPPED_LIVE	Red Hat Storage 3.0 enhancement and bug fix update #3	2015-01-15 18:35:28 UTC

Description senaik 2014-11-04 06:28:07 UTC

Description of problem:
======================
Snapshot daemon crashed while trying to access snap directory under .snaps directory 

Version-Release number of selected component (if applicable):
============================================================
glusterfs 3.6.0.30


How reproducible:
================
1/1


Steps to Reproduce:
==================
1.Create a 2x2 dist rep volume and start it

2.Fuse and NFS mount the volume 

3. Enbale USS on the volume

4.Create some IO
Fuse mount : for i in {1..10} ; do cp -rvf /etc etc.$i ; done
NFS mount  : for i in {1..10} ; do cp -rvf /etc nfs_etc.$i ; done

3.While IO is going on, create few snapshots on the volume 
for i in {1..10}; do gluster snapshot create snap"$i" vol0 ;done

4.After snapshot creation is completed, from fuse mount, cd  to .snaps 
[root@dhcp-0-97 .snaps]# ll
total 0
d---------. 0 root root 0 Jan  1  1970 snap1
d---------. 0 root root 0 Jan  1  1970 snap10
d---------. 0 root root 0 Jan  1  1970 snap2
d---------. 0 root root 0 Jan  1  1970 snap3
d---------. 0 root root 0 Jan  1  1970 snap4
d---------. 0 root root 0 Jan  1  1970 snap5
d---------. 0 root root 0 Jan  1  1970 snap6
d---------. 0 root root 0 Jan  1  1970 snap7
d---------. 0 root root 0 Jan  1  1970 snap8
d---------. 0 root root 0 Jan  1  1970 snap9

cd to snap1 and list the files and directories under them, resulted in snapd crash

[root@dhcp-0-97 .snaps]# cd snap1
[root@dhcp-0-97 snap1]# ls
ls: cannot read symbolic link rc4.d: Transport endpoint is not connected
ls: cannot access cups: Transport endpoint is not connected
ls: cannot access cron.weekly: Transport endpoint is not connected
ls: cannot access quotatab: Transport endpoint is not connected
ls: reading directory .: File descriptor in bad state
aliases.db   cron.weekly  environment  magic       my.cnf  PackageKit  printcap  rc4.d  shells      xdg
cron.hourly  cups         gshadow-     modprobe.d  oddjob  plymouth    quotatab  rc.d   statetab.d  yum.conf
[root@dhcp-0-97 snap1]# ll
ls: cannot open directory .: Transport endpoint is not connected
[root@dhcp-0-97 snap1]# ls
ls: cannot open directory .: Transport endpoint is not connected
[root@dhcp-0-97 snap1]# ls
ls: cannot open directory .: Transport endpoint is not connected
[root@dhcp-0-97 snap1]# cd ..
bash: cd: ..: Transport endpoint is not connected
[root@dhcp-0-97 snap1]# cd ..
bash: cd: ..: Transport endpoint is not connected

gluster v status vol0
Status of volume: vol0
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick snapshot13.lab.eng.blr.redhat.com:/rhs/brick1/b1	49152	Y	16104
Brick snapshot14.lab.eng.blr.redhat.com:/rhs/brick1/b1	49152	Y	14350
Brick snapshot15.lab.eng.blr.redhat.com:/rhs/brick1/b1	49152	Y	14379
Brick snapshot16.lab.eng.blr.redhat.com:/rhs/brick1/b1	49152	Y	14046
Snapshot Daemon on localhost				N/A	N	16315
NFS Server on localhost					2049	Y	16330
Self-heal Daemon on localhost				N/A	Y	16257
Snapshot Daemon on snapshot14.lab.eng.blr.redhat.com	49160	Y	14527
NFS Server on snapshot14.lab.eng.blr.redhat.com		2049	Y	14542
Self-heal Daemon on snapshot14.lab.eng.blr.redhat.com	N/A	Y	14480
Snapshot Daemon on snapshot16.lab.eng.blr.redhat.com	49160	Y	14227
NFS Server on snapshot16.lab.eng.blr.redhat.com		2049	Y	14242
Self-heal Daemon on snapshot16.lab.eng.blr.redhat.com	N/A	Y	14179
Snapshot Daemon on snapshot15.lab.eng.blr.redhat.com	49160	Y	14560
NFS Server on snapshot15.lab.eng.blr.redhat.com		2049	Y	14567
Self-heal Daemon on snapshot15.lab.eng.blr.redhat.com	N/A	Y	14506
 
Task Status of Volume vol0
------------------------------------------------------------------------------


Actual results:
===============
snapd crash while trying to access the snap directory under .snaps 


Expected results:
================
Accessing snaps under .snaps should not result in any crash 


Additional info:
===============

snapd log snippet:
~~~~~~~~~~~~~~~~~~

[2014-11-04 05:58:04.581582] I [snapview-server-mgmt.c:27:mgmt_cbk_snap] 0-mgmt: list of snapshots changed
[2014-11-04 05:58:15.695738] W [dict.c:1307:dict_get_with_ref] (-->/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x12c) [0x396aa271dc] (-->/usr/lib64/glusterfs/3.6.0.30/xlator/features/snapview-server.so(svs_lookup+0x2e3) [0x7f47ce276f03] (-->/usr/lib64/libglusterfs.so.0(dict_get_str_boolean+0x1f) [0x396aa1aabf]))) 0-dict: dict OR key (entry-point) is NULL
[2014-11-04 05:58:19.481945] W [dict.c:1307:dict_get_with_ref] (-->/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x12c) [0x396aa271dc] (-->/usr/lib64/glusterfs/3.6.0.30/xlator/features/snapview-server.so(svs_lookup+0x2e3) [0x7f47ce276f03] (-->/usr/lib64/libglusterfs.so.0(dict_get_str_boolean+0x1f) [0x396aa1aabf]))) 0-dict: dict OR key (entry-point) is NULL
pending frames:
frame : type(0) op(2)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-11-04 05:58:19
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.30
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x396aa1ff06]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x396aa3a59f]
/lib64/libc.so.6[0x343c8326a0]
/usr/lib64/libglusterfs.so.0(default_readlink+0x32)[0x396aa25a52]
/usr/lib64/libglusterfs.so.0(default_readlink_resume+0x137)[0x396aa293f7]
/usr/lib64/libglusterfs.so.0(call_resume+0x54e)[0x396aa41cde]
/usr/lib64/glusterfs/3.6.0.30/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f47ce069348]
/lib64/libpthread.so.0[0x343cc079d1]
/lib64/libc.so.6(clone+0x6d)[0x343c8e89dd]

Comment 3 senaik 2014-11-04 07:09:01 UTC

Retried the steps again, able to hit the crash consistently.

Comment 4 senaik 2014-11-05 13:03:59 UTC

bt of the core :
==============

oaded symbols for /usr/lib64/glusterfs/3.6.0.30/xlator/meta.so
Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_dns.so.2
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Core was generated by `/usr/sbin/glusterfsd -s localhost --volfile-id snapd/vol1 -p /var/lib/glusterd/'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000396aa25a52 in default_readlink (frame=0x7f215bafb0f8, this=0x1768ca0, loc=0x7f215b583f38, size=4096, 
    xdata=0x0) at defaults.c:1921
1921	        STACK_WIND_TAIL (frame, FIRST_CHILD(this),
Missing separate debuginfos, use: debuginfo-install glusterfs-3.6.0.30-1.el6rhs.x86_64
(gdb) bt
#0  0x000000396aa25a52 in default_readlink (frame=0x7f215bafb0f8, this=0x1768ca0, loc=0x7f215b583f38, size=4096, 
    xdata=0x0) at defaults.c:1921
#1  0x000000396aa293f7 in default_readlink_resume (frame=0x7f215bafb2fc, this=0x176cc00, loc=0x7f215b583f38, 
    size=4096, xdata=0x0) at defaults.c:1491
#2  0x000000396aa41cde in call_resume_wind (stub=0x7f215b583ef8) at call-stub.c:2322
#3  call_resume (stub=0x7f215b583ef8) at call-stub.c:2841
#4  0x00007f214e3c8348 in iot_worker (data=0x1783730) at io-threads.c:214
#5  0x000000343cc079d1 in start_thread () from /lib64/libpthread.so.0
#6  0x000000343c8e89dd in clone () from /lib64/libc.so.6

Comment 6 Avra Sengupta 2014-11-17 07:33:51 UTC

Fixed with https://code.engineering.redhat.com/gerrit/36755

Comment 7 senaik 2014-11-20 06:00:06 UTC

Version :glusterfs-3.6.0.33-1

Retried the steps as mentioned in the Description, did not face the crash. Marking the bug as 'Verified'.

Comment 9 errata-xmlrpc 2015-01-15 13:41:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html

Note You need to log in before you can comment on or make changes to this bug.