Bug 1110119 - [SNAPSHOT] : glusterd crash with ping_timer set to 30 (default value) while snapshot creation was in progress
Summary: [SNAPSHOT] : glusterd crash with ping_timer set to 30 (default value) while s...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: snapshot
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: Anoop
URL:
Whiteboard: SNAPSHOT
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-17 05:42 UTC by senaik
Modified: 2018-04-04 09:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-04 09:02:31 UTC
Embargoed:


Attachments (Terms of Use)

Description senaik 2014-06-17 05:42:26 UTC
Description of problem:
=======================
As per BZ 1096729, we were seeing frequent disconnects between peer and brick which led to snapshot creation failure, IO failures while snapshot creation was in progress for multiple volumes.

Work around provided for BZ 1096729  was to disable ping timer (edit the /etc/glusterfs/glusterd.vol and set ping timeout to 0 and restart glusterd).
As per comment 14 in BZ 1096729, this is going in as a Known Issue for Denali (doc bug raised- BZ 1109150)

We retried snapshot creation with ping time out set to 30 and we faced similar disconnect issues and also faced glusterd crash. After discussion with the developers, raising this bug to track the issue of the glusterd crash. 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.6.0.15-1.el6rhs.x86_64

How reproducible:
================
1/1

Steps to Reproduce:
==================
4 node cluster
Ping time out set to 30
restart glusterd

1.Create 4 volumes 
2.Fuse and NFS mount the volume 
3.Create IO on all the volumes at the same time 
for i in {1..400}; do dd if=/dev/urandom of=fuse_vol0"$i" bs=10M count=1; done

4.Create snapshots on all volumes at the same time 
for i in {1..100}; do gluster snapshot create snap$i vol0 ; done 

Few snapshot create failures were seen and glusterd crashed

Actual results:
==============
Glusterd crash

Expected results:
================
There should be no crash seen 

Additional info:
================
Uploaded the sosreports and the core file:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1096729/

Comment 2 Vijaikumar Mallikarjuna 2014-07-17 06:14:15 UTC
Core file attached in the bug looks corrupted. Will update the bug once the problem is recreated again.

Comment 7 Sunny Kumar 2018-04-04 09:02:31 UTC
 Looks like this is not a valid bug anymore will reopen the bug once the problem is recreated again.

-Sunny


Note You need to log in before you can comment on or make changes to this bug.