Bug 1110119

Summary: [SNAPSHOT] : glusterd crash with ping_timer set to 30 (default value) while snapshot creation was in progress
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: senaik
Component: snapshotAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WORKSFORME QA Contact: Anoop <annair>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: rhs-bugs, smohan, sunkumar
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: SNAPSHOT
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-04 09:02:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description senaik 2014-06-17 05:42:26 UTC
Description of problem:
=======================
As per BZ 1096729, we were seeing frequent disconnects between peer and brick which led to snapshot creation failure, IO failures while snapshot creation was in progress for multiple volumes.

Work around provided for BZ 1096729  was to disable ping timer (edit the /etc/glusterfs/glusterd.vol and set ping timeout to 0 and restart glusterd).
As per comment 14 in BZ 1096729, this is going in as a Known Issue for Denali (doc bug raised- BZ 1109150)

We retried snapshot creation with ping time out set to 30 and we faced similar disconnect issues and also faced glusterd crash. After discussion with the developers, raising this bug to track the issue of the glusterd crash. 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.6.0.15-1.el6rhs.x86_64

How reproducible:
================
1/1

Steps to Reproduce:
==================
4 node cluster
Ping time out set to 30
restart glusterd

1.Create 4 volumes 
2.Fuse and NFS mount the volume 
3.Create IO on all the volumes at the same time 
for i in {1..400}; do dd if=/dev/urandom of=fuse_vol0"$i" bs=10M count=1; done

4.Create snapshots on all volumes at the same time 
for i in {1..100}; do gluster snapshot create snap$i vol0 ; done 

Few snapshot create failures were seen and glusterd crashed

Actual results:
==============
Glusterd crash

Expected results:
================
There should be no crash seen 

Additional info:
================
Uploaded the sosreports and the core file:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1096729/

Comment 2 Vijaikumar Mallikarjuna 2014-07-17 06:14:15 UTC
Core file attached in the bug looks corrupted. Will update the bug once the problem is recreated again.

Comment 7 Sunny Kumar 2018-04-04 09:02:31 UTC
 Looks like this is not a valid bug anymore will reopen the bug once the problem is recreated again.

-Sunny