1110119 – [SNAPSHOT] : glusterd crash with ping_timer set to 30 (default value) while snapshot creation was in progress

Bug 1110119 - [SNAPSHOT] : glusterd crash with ping_timer set to 30 (default value) while snapshot creation was in progress

Summary: [SNAPSHOT] : glusterd crash with ping_timer set to 30 (default value) while s...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	Anoop
Docs Contact:
URL:
Whiteboard:	SNAPSHOT
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-06-17 05:42 UTC by senaik
Modified:	2018-04-04 09:02 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-04 09:02:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description senaik 2014-06-17 05:42:26 UTC

Description of problem:
=======================
As per BZ 1096729, we were seeing frequent disconnects between peer and brick which led to snapshot creation failure, IO failures while snapshot creation was in progress for multiple volumes.

Work around provided for BZ 1096729  was to disable ping timer (edit the /etc/glusterfs/glusterd.vol and set ping timeout to 0 and restart glusterd).
As per comment 14 in BZ 1096729, this is going in as a Known Issue for Denali (doc bug raised- BZ 1109150)

We retried snapshot creation with ping time out set to 30 and we faced similar disconnect issues and also faced glusterd crash. After discussion with the developers, raising this bug to track the issue of the glusterd crash. 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.6.0.15-1.el6rhs.x86_64

How reproducible:
================
1/1

Steps to Reproduce:
==================
4 node cluster
Ping time out set to 30
restart glusterd

1.Create 4 volumes 
2.Fuse and NFS mount the volume 
3.Create IO on all the volumes at the same time 
for i in {1..400}; do dd if=/dev/urandom of=fuse_vol0"$i" bs=10M count=1; done

4.Create snapshots on all volumes at the same time 
for i in {1..100}; do gluster snapshot create snap$i vol0 ; done 

Few snapshot create failures were seen and glusterd crashed

Actual results:
==============
Glusterd crash

Expected results:
================
There should be no crash seen 

Additional info:
================
Uploaded the sosreports and the core file:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1096729/

Comment 2 Vijaikumar Mallikarjuna 2014-07-17 06:14:15 UTC

Core file attached in the bug looks corrupted. Will update the bug once the problem is recreated again.

Comment 7 Sunny Kumar 2018-04-04 09:02:31 UTC

 Looks like this is not a valid bug anymore will reopen the bug once the problem is recreated again.

-Sunny

Note You need to log in before you can comment on or make changes to this bug.