Bug 1234720

Summary: glusterd: glusterd crashes while importing a USS enabled volume which is already started
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Anil Shah <ashah>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: amukherj, ashah, nlevinki, nsathyan, sasundar, vagarwal, vbellur
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: RHGS 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: glusterd
Fixed In Version: glusterfs-3.7.1-6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1234819 (view as bug list) Environment:
Last Closed: 2015-07-29 05:06:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1202842, 1223636, 1234819, 1235208    

Description Anil Shah 2015-06-23 06:39:49 UTC
Description of problem:

while doing some glusterd operations, glusterd crashed and core dump seen

Version-Release number of selected component (if applicable):

[root@darkknightrises ~]# rpm -qa | grep glusterfs
glusterfs-fuse-3.7.1-4.el6rhs.x86_64
glusterfs-geo-replication-3.7.1-4.el6rhs.x86_64
samba-vfs-glusterfs-4.1.17-7.el6rhs.x86_64
glusterfs-libs-3.7.1-4.el6rhs.x86_64
glusterfs-3.7.1-4.el6rhs.x86_64
glusterfs-api-3.7.1-4.el6rhs.x86_64
glusterfs-server-3.7.1-4.el6rhs.x86_64
glusterfs-debuginfo-3.7.1-4.el6rhs.x86_64
glusterfs-client-xlators-3.7.1-4.el6rhs.x86_64
glusterfs-cli-3.7.1-4.el6rhs.x86_64



How reproducible:

Intermittent 

Steps to Reproduce:

while performing these operation  glusterd crashed

[2015-06-22 14:24:42.208618]  : peer probe 10.70.33.219 : SUCCESS
[2015-06-22 14:24:46.194791]  : peer probe 10.70.33.225 : SUCCESS
[2015-06-22 14:24:51.621476]  : peer probe 10.70.44.13 : SUCCESS
[2015-06-22 14:25:13.687880]  : v create vol0 replica 2 10.70.33.214:/rhs/brick1/b1 10.70.33.219:/rhs/brick1/b2 10.70.33.225:/rhs/brick1/b3 10.70.44.13:/rhs/brick1/b4 : SUCCESS
[2015-06-22 14:25:19.494155]  : v start vol0 : SUCCESS
[2015-06-22 14:25:25.046000]  : v set vol0 uss enable : SUCCESS
[2015-06-22 14:25:28.913599]  : v set vol0 features.show-snapshot-directory enable : SUCCESS
[2015-06-22 14:25:53.157052]  : v quota vol0 enable : SUCCESS
[2015-06-22 14:26:13.762954]  : v quota vol0 limit-usage / 10TB : SUCCESS
[2015-06-22 14:26:21.242386]  : v quota vol0 list : SUCCESS
[2015-06-22 14:26:45.082160]  : v bitrot vol0 enable : SUCCESS
[2015-06-22 14:27:17.298295]  : v status : SUCCESS
[2015-06-22 14:27:17.319767]  : v status : SUCCESS
[2015-06-22 14:28:44.033759]  : v status vol0 : SUCCESS
[2015-06-22 14:29:25.739482]  : v quota vol0 disable : SUCCESS
[2015-06-22 14:38:03.338059]  : v start vol0 force : SUCCESS
[2015-06-22 14:38:55.774307]  : v set vol0 self-healdaemon off : FAILED : option : self-healdaemon does not exist
Did you mean self-heal-daemon?
[2015-06-22 14:39:00.587216]  : v set vol0 self-heal-daemon off : SUCCESS
[2015-06-22 14:39:10.367885]  : v set vol0 data-self-heal off : SUCCESS
[2015-06-22 14:39:15.151551]  : v set vol0 metadata-self-heal off : SUCCESS
[2015-06-22 14:39:23.569285]  : v set vol0 entry-self-heal off : SUCCESS
[2015-06-22 14:39:32.601471]  : v status : SUCCESS
[2015-06-22 14:39:32.621282]  : v status : SUCCESS
[2015-06-22 14:40:07.483601]  : v status : SUCCESS
[2015-06-22 14:40:07.503480]  : v status : SUCCESS
[2015-06-22 14:40:21.673774]  : v start vol0 force : SUCCESS
[2015-06-22 14:40:25.149148]  : v status : SUCCESS
[2015-06-22 14:40:25.169665]  : v status : SUCCESS
[2015-06-22 14:40:54.665495]  : v start vol0 force : SUCCESS
[2015-06-22 14:41:12.573928]  : v status : SUCCESS
[2015-06-22 14:41:12.594420]  : v status : SUCCESS
[2015-06-22 14:41:40.795869]  : snapshot create snap1 vol0 : SUCCESS
[2015-06-22 14:42:10.712606]  : snapshot activate snap1_GMT-2015.06.22-14.41.28 : SUCCESS
[2015-06-22 14:48:08.311581]  : v heal vol0 : FAILED : Self-heal-daemon is disabled. Heal will not be triggered on volume vol0
[2015-06-22 14:48:34.031211]  : v set vol0 self-heal-daemon on : SUCCESS
[2015-06-22 14:48:37.399677]  : v heal vol0 : SUCCESS
[2015-06-22 14:51:05.757067]  : v heal testvol : FAILED : Volume testvol does not exist
[2015-06-22 14:51:09.559867]  : v heal vol0 : SUCCESS
[2015-06-22 14:53:54.476210]  : snapshot create snap2 vol0 : SUCCESS
[2015-06-22 15:37:23.141560]  : snapshot list : SUCCESS
[2015-06-22 15:37:36.581430]  : snapshot info snap2_GMT-2015.06.22-14.53.41 : SUCCESS
[2015-06-22 15:37:46.885773]  : snapshot activate snap2_GMT-2015.06.22-14.53.41 : SUCCESS
[2015-06-22 15:41:15.611563]  : snapshot create snap3 vol0 : SUCCESS
[2015-06-22 15:41:28.435069]  : snapshot activate snap3_GMT-2015.06.22-15.41.02 : SUCCESS
[2015-06-22 15:44:37.974751]  : v status : SUCCESS
[2015-06-22 15:44:37.997998]  : v status : SUCCESS
[2015-06-23 05:23:44.471152]  : v set vol0 server.allow-insecure on : SUCCESS
[2015-06-23 05:24:17.311868]  : volume set all cluster.enable-shared-storage enable : SUCCESS
[2015-06-23 05:24:18.743490]  : volume create gluster_shared_storage replica 3 10.70.33.225:/var/run/gluster/ss_brick 10.70.44.13:/var/run/gluster/ss_brick 10.70.33.214:/var/run/gluster/ss_brick : SUCCESS
[2015-06-23 05:24:28.068691]  : volume start gluster_shared_storage : SUCCESS
[2015-06-23 05:24:40.455612]  : volume set all cluster.enable-shared-storage disable : SUCCESS
[2015-06-23 05:24:49.980950]  : volume stop gluster_shared_storage : SUCCESS
[2015-06-23 05:24:50.215017]  : volume delete gluster_shared_storage : SUCCESS
[2015-06-23 05:27:17.094921]  : volume set all cluster.enable-shared-storage enable : SUCCESS
[2015-06-23 05:27:18.452125]  : volume create gluster_shared_storage replica 3 10.70.33.225:/var/run/gluster/ss_brick 10.70.44.13:/var/run/gluster/ss_brick 10.70.33.214:/var/run/gluster/ss_brick : SUCCESS
[2015-06-23 05:27:27.816957]  : volume start gluster_shared_storage : SUCCESS
[2015-06-23 05:58:51.799609]  : volume status : SUCCESS
[2015-06-23 05:58:51.820369]  : volume status : SUCCESS
[2015-06-23 05:58:51.840323]  : volume status : SUCCESS


Actual results:

glusterd crashed

Expected results:


Additional info:


Document URL: 

Section Number and Name: 

Describe the issue: 

Suggestions for improvement: 

Additional information: 


Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 SATHEESARAN 2015-06-23 09:58:23 UTC
There is the consistent test case to reproduce this issue :

1. Create a volume ( of any type ) and start it
2. Take a snapshot of a volume
3. Enable USS on the volume
4. Probe a peer

Observation : You will be observing glusterd crash on the newly probed peer

Taking a snapshot, enabling USS , and probing a peer is a common operation when expanding the cluster. In those cases, this crash would be evident.

Considering this issue as the blocker and this is a regression caused by the patch - https://code.engineering.redhat.com/gerrit/51027 as mentioned by Atin

All the above reasons make this bug eligible for BLOCKER

Comment 7 Atin Mukherjee 2015-06-23 10:48:28 UTC
Upstream patch review.gluster.org/11364 is posted for review

Comment 9 Atin Mukherjee 2015-06-24 10:33:11 UTC
Downstream patch posted for review:

https://code.engineering.redhat.com/gerrit/51469

Comment 10 Atin Mukherjee 2015-06-25 03:48:18 UTC
Downstream patch is merged now

Comment 11 SATHEESARAN 2015-07-01 01:03:39 UTC
Tested with RHGS 3.1 Nightly build - glusterfs-3.7.1-6.el6rhs with the steps as mentioned in comment6

marking this bug as VERIFIED

Comment 12 errata-xmlrpc 2015-07-29 05:06:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html