Bug 1292808

Summary: [USS]: Snapd related core generated while accessing snapshot after its recreation.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: snapshotAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: mzywusko, rhs-bugs
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-16 16:04:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shashank Raj 2015-12-18 12:15:03 UTC
Description of problem:
Snapd related core generated while accessing snapshot after its recreation.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-12

How reproducible:
twice

Steps to Reproduce: 

Observed during automation run and attaching snippets in the steps where errors and failures are observed

1.Create a tiered volume and start it.
2.Attach tier to the volume.
3.Mount the volume and create a file under mount point
echo "Hello" > /mnt/glusterfs/file
4.Enable uss on the volume
5.Create a snapshot and activate it
6.Do the following on the client and observe that it gives below error message

"stat /mnt/glusterfs/.snaps >/dev/null 2>&1 && cd /mnt/glusterfs/.snaps/snap0 && ls >/dev/null         && cat file" on dhcp35-15.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-18 17:15:48,838 ERROR uss_check_file_content Content of file does not matchecho "Namaskara" > /mnt/glusterfs/file

7.delete the snapshotcat /mnt/glusterfs/.snaps/snap0/file
8.echo "Namaskara" > /mnt/glusterfs/file, to the file
9.create the snapshot again with the same name and activate it.
10.try to access the file from snapshot and observe that it fails with transport endpoint not connected.
cat: /mnt/glusterfs/.snaps/snap0/file: Transport endpoint is not connected
11.Core related to snapd is observed on the node.

Actual results:
snapd crashed

Expected results:
Snapd should not crash.

Additional info:
Following backtrace observed:

#0  pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
#1  0x00007fb7b41f7059 in inode_ctx_get0 (inode=0x7fb7819e13d4, xlator=xlator@entry=0x7fb79c0228f0, value1=value1@entry=0x7fb7a48eeb90)
    at inode.c:2089
#2  0x00007fb7b41f70e8 in inode_needs_lookup (inode=0x7fb7819e13d4, this=0x7fb79c0228f0) at inode.c:1872
#3  0x00007fb7a6779286 in __glfs_resolve_inode (fs=fs@entry=0x7fb79c0008e0, subvol=subvol@entry=0x7fb77c024e20, object=object@entry=0x7fb79c03eb10)
    at glfs-resolve.c:997
#4  0x00007fb7a677938b in glfs_resolve_inode (fs=fs@entry=0x7fb79c0008e0, subvol=subvol@entry=0x7fb77c024e20, object=object@entry=0x7fb79c03eb10)
    at glfs-resolve.c:1023
#5  0x00007fb7a677a7d2 in pub_glfs_h_open (fs=0x7fb79c0008e0, object=object@entry=0x7fb79c03eb10, flags=flags@entry=0) at glfs-handleops.c:634
#6  0x00007fb7a698fbe5 in svs_open (frame=0x7fb7b1ce0230, this=0x7fb7a0005e80, loc=0x7fb7b178606c, flags=0, fd=0x7fb7a0021d2c, 
    xdata=<optimized out>) at snapview-server.c:1887
#7  0x00007fb7b41e4eba in default_open_resume (frame=0x7fb7b1ce002c, this=0x7fb7a0009850, loc=0x7fb7b178606c, flags=0, fd=0x7fb7a0021d2c, xdata=0x0)
    at defaults.c:1415
#8  0x00007fb7b420417d in call_resume (stub=0x7fb7b178602c) at call-stub.c:2576
#9  0x00007fb7a5ab8363 in iot_worker (data=0x7fb7a001cc70) at io-threads.c:215
#10 0x00007fb7b303cdc5 in start_thread (arg=0x7fb7a48ef700) at pthread_create.c:308
#11 0x00007fb7b29831cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 2 Shashank Raj 2015-12-18 12:24:12 UTC
sosreports and core are placed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1292808

Comment 3 Shashank Raj 2016-02-04 06:17:09 UTC
This bug is reproducible everytime we run the automated test. Should be looked upon.