Description of problem: while doing some glusterd operations, glusterd crashed and core dump seen Version-Release number of selected component (if applicable): [root@darkknightrises ~]# rpm -qa | grep glusterfs glusterfs-fuse-3.7.1-4.el6rhs.x86_64 glusterfs-geo-replication-3.7.1-4.el6rhs.x86_64 samba-vfs-glusterfs-4.1.17-7.el6rhs.x86_64 glusterfs-libs-3.7.1-4.el6rhs.x86_64 glusterfs-3.7.1-4.el6rhs.x86_64 glusterfs-api-3.7.1-4.el6rhs.x86_64 glusterfs-server-3.7.1-4.el6rhs.x86_64 glusterfs-debuginfo-3.7.1-4.el6rhs.x86_64 glusterfs-client-xlators-3.7.1-4.el6rhs.x86_64 glusterfs-cli-3.7.1-4.el6rhs.x86_64 How reproducible: Intermittent Steps to Reproduce: while performing these operation glusterd crashed [2015-06-22 14:24:42.208618] : peer probe 10.70.33.219 : SUCCESS [2015-06-22 14:24:46.194791] : peer probe 10.70.33.225 : SUCCESS [2015-06-22 14:24:51.621476] : peer probe 10.70.44.13 : SUCCESS [2015-06-22 14:25:13.687880] : v create vol0 replica 2 10.70.33.214:/rhs/brick1/b1 10.70.33.219:/rhs/brick1/b2 10.70.33.225:/rhs/brick1/b3 10.70.44.13:/rhs/brick1/b4 : SUCCESS [2015-06-22 14:25:19.494155] : v start vol0 : SUCCESS [2015-06-22 14:25:25.046000] : v set vol0 uss enable : SUCCESS [2015-06-22 14:25:28.913599] : v set vol0 features.show-snapshot-directory enable : SUCCESS [2015-06-22 14:25:53.157052] : v quota vol0 enable : SUCCESS [2015-06-22 14:26:13.762954] : v quota vol0 limit-usage / 10TB : SUCCESS [2015-06-22 14:26:21.242386] : v quota vol0 list : SUCCESS [2015-06-22 14:26:45.082160] : v bitrot vol0 enable : SUCCESS [2015-06-22 14:27:17.298295] : v status : SUCCESS [2015-06-22 14:27:17.319767] : v status : SUCCESS [2015-06-22 14:28:44.033759] : v status vol0 : SUCCESS [2015-06-22 14:29:25.739482] : v quota vol0 disable : SUCCESS [2015-06-22 14:38:03.338059] : v start vol0 force : SUCCESS [2015-06-22 14:38:55.774307] : v set vol0 self-healdaemon off : FAILED : option : self-healdaemon does not exist Did you mean self-heal-daemon? [2015-06-22 14:39:00.587216] : v set vol0 self-heal-daemon off : SUCCESS [2015-06-22 14:39:10.367885] : v set vol0 data-self-heal off : SUCCESS [2015-06-22 14:39:15.151551] : v set vol0 metadata-self-heal off : SUCCESS [2015-06-22 14:39:23.569285] : v set vol0 entry-self-heal off : SUCCESS [2015-06-22 14:39:32.601471] : v status : SUCCESS [2015-06-22 14:39:32.621282] : v status : SUCCESS [2015-06-22 14:40:07.483601] : v status : SUCCESS [2015-06-22 14:40:07.503480] : v status : SUCCESS [2015-06-22 14:40:21.673774] : v start vol0 force : SUCCESS [2015-06-22 14:40:25.149148] : v status : SUCCESS [2015-06-22 14:40:25.169665] : v status : SUCCESS [2015-06-22 14:40:54.665495] : v start vol0 force : SUCCESS [2015-06-22 14:41:12.573928] : v status : SUCCESS [2015-06-22 14:41:12.594420] : v status : SUCCESS [2015-06-22 14:41:40.795869] : snapshot create snap1 vol0 : SUCCESS [2015-06-22 14:42:10.712606] : snapshot activate snap1_GMT-2015.06.22-14.41.28 : SUCCESS [2015-06-22 14:48:08.311581] : v heal vol0 : FAILED : Self-heal-daemon is disabled. Heal will not be triggered on volume vol0 [2015-06-22 14:48:34.031211] : v set vol0 self-heal-daemon on : SUCCESS [2015-06-22 14:48:37.399677] : v heal vol0 : SUCCESS [2015-06-22 14:51:05.757067] : v heal testvol : FAILED : Volume testvol does not exist [2015-06-22 14:51:09.559867] : v heal vol0 : SUCCESS [2015-06-22 14:53:54.476210] : snapshot create snap2 vol0 : SUCCESS [2015-06-22 15:37:23.141560] : snapshot list : SUCCESS [2015-06-22 15:37:36.581430] : snapshot info snap2_GMT-2015.06.22-14.53.41 : SUCCESS [2015-06-22 15:37:46.885773] : snapshot activate snap2_GMT-2015.06.22-14.53.41 : SUCCESS [2015-06-22 15:41:15.611563] : snapshot create snap3 vol0 : SUCCESS [2015-06-22 15:41:28.435069] : snapshot activate snap3_GMT-2015.06.22-15.41.02 : SUCCESS [2015-06-22 15:44:37.974751] : v status : SUCCESS [2015-06-22 15:44:37.997998] : v status : SUCCESS [2015-06-23 05:23:44.471152] : v set vol0 server.allow-insecure on : SUCCESS [2015-06-23 05:24:17.311868] : volume set all cluster.enable-shared-storage enable : SUCCESS [2015-06-23 05:24:18.743490] : volume create gluster_shared_storage replica 3 10.70.33.225:/var/run/gluster/ss_brick 10.70.44.13:/var/run/gluster/ss_brick 10.70.33.214:/var/run/gluster/ss_brick : SUCCESS [2015-06-23 05:24:28.068691] : volume start gluster_shared_storage : SUCCESS [2015-06-23 05:24:40.455612] : volume set all cluster.enable-shared-storage disable : SUCCESS [2015-06-23 05:24:49.980950] : volume stop gluster_shared_storage : SUCCESS [2015-06-23 05:24:50.215017] : volume delete gluster_shared_storage : SUCCESS [2015-06-23 05:27:17.094921] : volume set all cluster.enable-shared-storage enable : SUCCESS [2015-06-23 05:27:18.452125] : volume create gluster_shared_storage replica 3 10.70.33.225:/var/run/gluster/ss_brick 10.70.44.13:/var/run/gluster/ss_brick 10.70.33.214:/var/run/gluster/ss_brick : SUCCESS [2015-06-23 05:27:27.816957] : volume start gluster_shared_storage : SUCCESS [2015-06-23 05:58:51.799609] : volume status : SUCCESS [2015-06-23 05:58:51.820369] : volume status : SUCCESS [2015-06-23 05:58:51.840323] : volume status : SUCCESS Actual results: glusterd crashed Expected results: Additional info: Document URL: Section Number and Name: Describe the issue: Suggestions for improvement: Additional information: Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
There is the consistent test case to reproduce this issue : 1. Create a volume ( of any type ) and start it 2. Take a snapshot of a volume 3. Enable USS on the volume 4. Probe a peer Observation : You will be observing glusterd crash on the newly probed peer Taking a snapshot, enabling USS , and probing a peer is a common operation when expanding the cluster. In those cases, this crash would be evident. Considering this issue as the blocker and this is a regression caused by the patch - https://code.engineering.redhat.com/gerrit/51027 as mentioned by Atin All the above reasons make this bug eligible for BLOCKER
Upstream patch review.gluster.org/11364 is posted for review
Downstream patch posted for review: https://code.engineering.redhat.com/gerrit/51469
Downstream patch is merged now
Tested with RHGS 3.1 Nightly build - glusterfs-3.7.1-6.el6rhs with the steps as mentioned in comment6 marking this bug as VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html